An Efficient Implementation of Double Precision 1-D FFT for GPUs Using CUDA
School of Computer Science and Technology, Anhui University, Hefei 230039, China
Journal of Information & Computational Science 9: 2 (2012) 387-394, 2012
@article{liua2012efficient,
title={An Efficient Implementation of Double Precision 1-D FFT for GPUs Using CUDA},
author={Liua, Y. and Guoc, L. and Luoa, B. and Zhanga, X.},
journal={Journal of Information & Computational Science},
volume={9},
number={2},
pages={387–394},
year={2012}
}
Fast Fourier Transform (FFT) is a well known and widely used tool in many scientific and engineering fields. CUFFT, which is the NVIDIA’s FFT library included in the CUDA toolkit, supports double precision FFTs. However, the implementation of CUFFT is not very efficient. In this paper, we implement an efficient double-precision Cooley-tukey algorithm for GPUs using CUDA. Some programming techniques are employed to exploit the hardware characteristics. These techniques include on-chip shared memory utilization, removing redundant computation, and coalescing the global memory access. Experiments show that the performance of our 1-D FFT is as fast as CUFFT. Furthermore, the performance of our FFT implementation is more than twice faster than CUFFT for small input sizes.
February 17, 2012 by hgpu