high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » An efficient, model-based CPU-GPU heterogeneous FFT library

An efficient, model-based CPU-GPU heterogeneous FFT library

Y. Ogata, T. Endo, N. Maruyama, S. Matsuoka

Tokyo Inst. of Technol., Tokyo

Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on (2008), pp. 1-10.

DOI:10.1109/IPDPS.2008.4536163

BibTeX

Source

1897

views

General-purpose computing on graphics processing units (GPGPU) is becoming popular in HPC because of its high peak performance. However, in spite of the potential performance improvements as well as recent promising results in scientific computing applications, its real performance is not necessarily higher than that of the current high-performance CPUs, especially with recent trends towards increasing the number of cores on a single die. This is because the GPU performance can be severely limited by such restrictions as memory size and bandwidth and programming using graphics-specific APIs. To overcome this problem, we propose a model-based, adaptive library for 2D FFT that automatically achieves optimal performance using available heterogeneous CPU-GPU computing resources. To find optimal load distribution ratios between CPUs and GPUs, we construct a performance model that captures the respective contributions of CPU vs. GPU, and predicts the total execution time of 2D-FFT for arbitrary problem sizes and load distribution. The performance model divides the FFT computation into several small sub steps, and predicts the execution time of each step using profiling results. Preliminary evaluation with our prototype shows that the performance model can predict the execution time of problem sizes that are 16 times as large as the profile runs with less than 20% error, and that the predicted optimal load distribution ratios have less than 1% error. We show that the resulting performance improvement using both CPUs and GPUs can be as high as 50% compared to using either a CPU core or a GPU.

Tags: Computer science, FFT

October 28, 2010 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

An efficient, model-based CPU-GPU heterogeneous FFT library

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

An efficient, model-based CPU-GPU heterogeneous FFT library

Share this:

Recent source codes

Most viewed papers (last 30 days)