high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Akira Nukada, Kento Sato, Satoshi Matsuoka

Tokyo Institute of Technology

International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), 2012

@inproceedings{nukada2012scalable,

title={Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer},

author={Nukada, A. and Sato, K. and Matsuoka, S.},

booktitle={Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis},

pages={44},

year={2012},

organization={IEEE Computer Society Press}

}

Download (PDF)

View

Source

1492

views

For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.

Tags: Computer science, CUDA, FFT, MPI, nVidia, Tesla M2090

November 23, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Share this:

Recent source codes

Most viewed papers (last 30 days)