high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Akira Nukada, Kento Sato, Satoshi Matsuoka

Tokyo Institute of Technology

International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), 2012

@inproceedings{nukada2012scalable,

title={Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer},

author={Nukada, A. and Sato, K. and Matsuoka, S.},

booktitle={Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis},

pages={44},

year={2012},

organization={IEEE Computer Society Press}

}

Download (PDF)

View

Source

1959

views

For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.

Tags: Computer science, CUDA, FFT, MPI, nVidia, Tesla M2090

November 23, 2012 by hgpu

No votes yet.

Please wait...