## Three Dimensional Fast Fourier Transform CUDA Implementation

Department of Computer Science and Engineering, University of California, San Diego

University of California, CSE260 Project Report, 2012

@article{aatish2012three,

title={Three Dimensional Fast Fourier Transform CUDA Implementation},

author={Aatish, Kumar and Zhang, Boyan},

year={2012}

}

A 3 dimensional DFT can be expressed as 3 DFTs on a 3 dimensional data along each dimension. Each of these 1 dimensional DFTs can be computed efficiently owing to the properties of the transform. This class of algorithms is known as the Fast Fourier Transform (FFT). We introduce the one dimensional FFT algorithm in this section, which will be used in our GPU implementation. Our implementation does an FFT transform in the row major dimension of a given three dimensional matrix at a time. Thus, the complete 3D FFT is a set of 1D FFT kernels and transpose kernels which bring a desired coordinate axis to the row major format to enable coalesced global reads. The implemented kernel performs a single precision 1D FFT and uses the fast math functions for calculating the sin and cos of the phases corresponding to twiddle factors. The GPU used is the C2050 Fermi GPU on the DIRAC cluster on the Carver system provided by the National Energy Research Scientific Computing Center. The NVIDIA Tesla C2050 has 448 parallel CUDA processor cores with 3 GB of memory.

January 15, 2013 by hgpu