Analyzing the CUDA Applications with its Latency and Bandwidth Tolerance

V.N. Pistulkar, C.A. Uttarwar
Department of Computer Science, Jawaharlal Darda Institute of Engineering & Technology, Yavatmal, MS, India
BIOINFO Computer Engineering, Volume 2, Issue 1, pp.25-30, 2012


   title={Analyzing the CUDA Applications with its Latency and Bandwidth Tolerance},

   author={Pistulkar, V.N. and Uttarwar, C.A.},



Download Download (PDF)   View View   Source Source   



The CUDA scalable parallel programming model provides readily-understood abstractions that free programmers to focus on efficient parallel algorithms. It uses a hierarchy of thread groups, shared memory, and barrier synchronization to express fine-grained and coarse-grained parallelism, using sequential C code for one thread. This paper explores the scalability of CUDA applications on systems with varying interconnect latencies, hiding a hardware detail from the programmer and making parallel programming more accessible to nonexperts. We use a combination of the Ocelot PTX emulator [1] and a discrete event simulator to evaluate the UIUC Parboil benchmarks [2] on three distinct GPU configurations. We find that these applications are sensitive to neither interconnect latency nor bandwidth, and that integrated GPU-CPU systems are not likely to perform any better than discrete GPUs or GPU clusters.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: