Exploring The Latency and Bandwidth Tolerance of CUDA Applications

Gregory Diamos, Sudnya Padalikar
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0250
NFinTes Tech Report, December 2009


   title={Exploring The Latency and Bandwidth Tolerance of CUDA Applications},

   author={Diamos, G. and Padalikar, S.},



Download Download (PDF)   View View   Source Source   



CUDA applications represent a new body of parallel programs. Although several paradigms exist for programming distributed systems and many-core processors, many users struggle to achieve a program that is scalable across systems with different hardware characteristics. This paper explores the scalability of CUDA applications on systems with varying interconnect latencies, hiding a hardware detail from the programmer and making parallel programming more accessible to non-experts. We use a combination of the Ocelot PTX emulator [1] and a discrete event simulator to evaluate the UIUC Parboil benchmarks [2] on three distinct GPU configurations. We find that these applications are sensitive to neither interconnect latency nor bandwidth, and that integrated GPU-CPU systems are not likely to perform any better than discrete GPUs or GPU clusters.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: