high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Chris Gregg, Kim Hazelwood

Department of Computer Science, University of Virginia

IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2011

DOI:10.1109/ISPASS.2011.5762730

@article{greggdata,

title={Where is the Data? Why You Cannot Debate CPU vs. GPU Performance Without the Answer},

author={Gregg, C. and Hazelwood, K.},

booktitle={IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2011},

year={2011}

}

Download (PDF)

View

Source

1062

views

General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000x over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code reduces the performance gap significantly. We demonstrate that this important discussion is missing a key aspect, specifically the question of where in the system data resides, and the overhead to move the data to where it will be used, and back again if necessary. We have benchmarked a broad set of GPU kernels on a number of platforms with different GPUs and our results show that when memory transfer times are included, it can easily take between 2 to 50x longer to run a kernel than the GPU processing time alone. Therefore, it is necessary to either include memory transfer overhead when reporting GPU performance, or to explain why this is not relevant for the application in question. We suggest a taxonomy for future CPU/GPU comparisons, and we argue that this is not only germane for reporting performance, but is important to heterogeneous scheduling research in general.

Tags: Computer science, nVidia, nVidia GeForce 9800 GT, nVidia GeForce GT 330 M, nVidia GeForce GTX 480, Performance, Review, Tesla C2050

May 11, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Share this:

Recent source codes

Most viewed papers (last 30 days)