8545

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Akira Nukada, Kento Sato, Satoshi Matsuoka
Tokyo Institute of Technology
International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), 2012

@inproceedings{nukada2012scalable,

   title={Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer},

   author={Nukada, A. and Sato, K. and Matsuoka, S.},

   booktitle={Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis},

   pages={44},

   year={2012},

   organization={IEEE Computer Society Press}

}

Download Download (PDF)   View View   Source Source   

1592

views

For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: