AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures

Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
The University of Texas at Austin, Austin, TX
The University of Texas at Austin, 2015


   title={AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures},

   author={Gholami, Amir and Hill, Judith and Malhotra, Dhairya and Biros, George},



We present a new library for scalable 3-D Fast Fourier Transforms (FFT). Despite the large amount of work on 3-D FFTs, we show that significant speedups can be achieved for large problem sizes and core counts. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements in existing technologies. The new library extends existing FFT libraries for x86 architectures (CPUs) and CUDA-enabled Graphics Processing Units (GPUs) to distributed memory clusters using the Message Passing Interface (MPI). Our library uses an optimized all-to-all communication for slab and pencil partitioning of both CPUs and GPUs. We present numerical results on the Maverick and Stampede platforms at the Texas Advanced Computing Center (TACC) and on the Titan system at the Oak Ridge National Laboratory (ORNL). We compare with the FFTW and the P3DFFT libraries and we show favorable performance across a range of processor counts and problem sizes. As a highlight from one of our strong-scaling experiments, our GPU-accelerated FFT is 4x faster than the P3DFFT for a 2048^3 problem on 4096 nodes on Titan using 4,096 GPUs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: