Correctly rounding elementary functions on GPU
UPMC Univ Paris 06 and CNRS UMR 7606, LIP6
arXiv:1211.3056 [cs.MS] (13 Nov 2012)
@article{2012arXiv1211.3056F,
author={Fortin}, P. and {Gouicem}, M. and {Graillat}, S.},
title={"{Correctly rounding elementary functions on GPU}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1211.3056},
primaryClass={"cs.MS"},
keywords={Computer Science – Mathematical Software, Computer Science – Numerical Analysis},
year={2012},
month={nov},
adsurl={http://adsabs.harvard.edu/abs/2012arXiv1211.3056F},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
The IEEE 754-2008 standard recommends the correct rounding of elementary functions. This requires to solve the Table Maker’s Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lef’evre algorithm, on Graphics Processing Units (GPU) which are massively parallel architectures with a partial SIMD execution (Single Instruction Multiple Data). We first propose an analysis of the Lef’evre hard-to-round argument search using the concept of continued fractions. We then propose a new parallel search algorithm much more efficient on GPU thanks to its more regular control flow. We also present an efficient hybrid CPU-GPU deployment of the generation of polynomial approximations required in Lef’evre algorithm. In the end, we manage to obtain overall speedups up to 53.4x on one GPU over a sequential CPU execution, and up to 7.1x over a multi-core CPU.
November 14, 2012 by hgpu