13932

AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures

Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
The University of Texas at Austin, Austin, TX
The University of Texas at Austin, 2015
BibTeX

We present a new library for scalable 3-D Fast Fourier Transforms (FFT). Despite the large amount of work on 3-D FFTs, we show that significant speedups can be achieved for large problem sizes and core counts. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements in existing technologies. The new library extends existing FFT libraries for x86 architectures (CPUs) and CUDA-enabled Graphics Processing Units (GPUs) to distributed memory clusters using the Message Passing Interface (MPI). Our library uses an optimized all-to-all communication for slab and pencil partitioning of both CPUs and GPUs. We present numerical results on the Maverick and Stampede platforms at the Texas Advanced Computing Center (TACC) and on the Titan system at the Oak Ridge National Laboratory (ORNL). We compare with the FFTW and the P3DFFT libraries and we show favorable performance across a range of processor counts and problem sizes. As a highlight from one of our strong-scaling experiments, our GPU-accelerated FFT is 4x faster than the P3DFFT for a 2048^3 problem on 4096 nodes on Titan using 4,096 GPUs.
No votes yet.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org