Fast Mersenne prime testing on the GPU
Alma College, Alma, MI
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, 2011
@inproceedings{thall2011fast,
title={Fast Mersenne prime testing on the GPU},
author={Thall, A.},
booktitle={Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units},
pages={6},
year={2011},
organization={ACM}
}
The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, and carry-save radix representations. gpuLucas uses the CUDA programming language and requires the double-precision floating point capabilities of recent GPUs. Results show up to 7x speedups over benchmark averages for optimized sequential code and factor-of-two speedups over CUDALucas, another GPU-based Lucas-Lehmer tester developed independently and with a different optimization strategy. This work demonstrates techniques for implementing GPU-based number theoretic algorithms on very large numbers, including fast multiplication, prefix-sum-based carry-propagation, and the use of carry-save arithmetic with balanced integers. The work presents timing profiles of convolution-based integer multiplication based on the IBDWT, in particular for non-power-of-two transformations, and establishes the usefulness of the software as a GPU benchmarking application and as a platform for large-integer and polynomial experimentation.
September 15, 2011 by hgpu