high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Correctly rounding elementary functions on GPU

Correctly rounding elementary functions on GPU

Pierre Fortin, Mourad Gouicem, Stef Graillat

UPMC Univ Paris 06 and CNRS UMR 7606, LIP6

arXiv:1211.3056 [cs.MS] (13 Nov 2012)

@article{2012arXiv1211.3056F,

author={Fortin}, P. and {Gouicem}, M. and {Graillat}, S.},

title={"{Correctly rounding elementary functions on GPU}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1211.3056},

primaryClass={"cs.MS"},

keywords={Computer Science – Mathematical Software, Computer Science – Numerical Analysis},

year={2012},

month={nov},

adsurl={http://adsabs.harvard.edu/abs/2012arXiv1211.3056F},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2243

views

The IEEE 754-2008 standard recommends the correct rounding of elementary functions. This requires to solve the Table Maker’s Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lef’evre algorithm, on Graphics Processing Units (GPU) which are massively parallel architectures with a partial SIMD execution (Single Instruction Multiple Data). We first propose an analysis of the Lef’evre hard-to-round argument search using the concept of continued fractions. We then propose a new parallel search algorithm much more efficient on GPU thanks to its more regular control flow. We also present an efficient hybrid CPU-GPU deployment of the generation of polynomial approximations required in Lef’evre algorithm. In the end, we manage to obtain overall speedups up to 53.4x on one GPU over a sequential CPU execution, and up to 7.1x over a multi-core CPU.

Tags: Algorithms, Computer science, CUDA, Elementary functions, nVidia, Tesla C2070

November 14, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Correctly rounding elementary functions on GPU

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Correctly rounding elementary functions on GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)