high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka

Tokyo Institute of Technology, Tokyo, Japan and Japan Science and Technology Agency, Kawaguchi, Saitama, Japan

In SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing (2008), pp. 1-11

DOI:10.1145/1413370.1413376

@conference{nukada2009bandwidth,

title={Bandwidth intensive 3-D FFT kernel for GPUs using CUDA},

author={Nukada, A. and Ogata, Y. and Endo, T. and Matsuoka, S.},

booktitle={High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for},

pages={1–11},

year={2009},

organization={IEEE}

}

Source

1631

views

Most GPU performance “hypes” have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being more than three times faster than any existing FFT implementations on GPUs including CUFFT. Careful programming techniques are employed to fully exploit modern GPU hardware characteristics while overcoming their limitations, including on-chip shared memory utilization, optimizing the number of threads and registers through appropriate localization, and avoiding low-speed stride memory accesses. Our kernel applied to real applications achieves orders of magnitude boost in power&cost vs. performance metrics. The off-card bandwidth limitation is still an issue, which could be alleviated somewhat with application kernels confinement within the card, while ideal solution being facilitation of faster GPU interfaces.

Tags: Computer science, CUDA, FFT, nVidia

December 7, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Share this:

Recent source codes

Most viewed papers (last 30 days)