Correctly rounding elementary functions on GPU
UPMC Univ Paris 06 and CNRS UMR 7606, LIP6
arXiv:1211.3056 [cs.MS] (13 Nov 2012)
The IEEE 754-2008 standard recommends the correct rounding of elementary functions. This requires to solve the Table Maker’s Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lef’evre algorithm, on Graphics Processing Units (GPU) which are massively parallel architectures with a partial SIMD execution (Single Instruction Multiple Data). We first propose an analysis of the Lef’evre hard-to-round argument search using the concept of continued fractions. We then propose a new parallel search algorithm much more efficient on GPU thanks to its more regular control flow. We also present an efficient hybrid CPU-GPU deployment of the generation of polynomial approximations required in Lef’evre algorithm. In the end, we manage to obtain overall speedups up to 53.4x on one GPU over a sequential CPU execution, and up to 7.1x over a multi-core CPU.
November 14, 2012 by hgpu