Exploring The Latency and Bandwidth Tolerance of CUDA Applications
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0250
NFinTes Tech Report, December 2009
@article{diamos2009exploring,
title={Exploring The Latency and Bandwidth Tolerance of CUDA Applications},
author={Diamos, G. and Padalikar, S.},
year={2009}
}
CUDA applications represent a new body of parallel programs. Although several paradigms exist for programming distributed systems and many-core processors, many users struggle to achieve a program that is scalable across systems with different hardware characteristics. This paper explores the scalability of CUDA applications on systems with varying interconnect latencies, hiding a hardware detail from the programmer and making parallel programming more accessible to non-experts. We use a combination of the Ocelot PTX emulator [1] and a discrete event simulator to evaluate the UIUC Parboil benchmarks [2] on three distinct GPU configurations. We find that these applications are sensitive to neither interconnect latency nor bandwidth, and that integrated GPU-CPU systems are not likely to perform any better than discrete GPUs or GPU clusters.
September 30, 2011 by hgpu