Data parallel loop statement extension to CUDA: GpuC
Department of Computer Engineering, Kadir Has University, Istanbul, 34083 Turkey
Symposium on Application Accelerators in High Performance Computing, 2009 (SAAHPC’09)
@article{bozkus2009data,
title={Data parallel loop statement extension to CUDA: GpuC},
author={Bozkus, Z. and Thakur, R. and Gropp, W. and Lusk, E.},
booktitle={Application Accelerators in High Performance Computing, 2009 Symposium, Papers},
year={2009}
}
In recent years, Graphics Processing Units (GPUs) have emerged as a powerful accelerator for general-purpose computations. GPUs are attached to every modern desktop and laptop host CPU as graphics accelerators. GPUs have over a hundred cores with lots of parallelism. Initially, they were used only for graphics applications such as image processing and video games. However, many other applications are starting to be ported to GPUs to extend the power of the GPU beyond graphics. Current approaches to program GPUs are still relatively lowlevel programming models such as Compute Unified Device Architecture (CUDA), a programming model from NVIDIA, and Open Compute Language (OpenCL), created by Apple in cooperation with others. These two programming models have all the complexity of parallel programming such as breaking up the task into smaller tasks, assigning the smaller tasks to multiple CPUs to work on simultaneously, and coordinating the CPUs. There is a growing need to lower the complexity of programming these devices. In this paper, we propose a data-parallel loop (forall) extension to the CUDA programming model. We describe our prototype compiler named GpuC. The compiler takes dataparallel forall loops along with the other CUDA statements as input and generates CUDA code as output. We present compilation steps, optimizations, and code generations. We identified several key optimizations for the compiler. We present experimental results from four NAS benchmarks to show performance gains.
February 21, 2011 by hgpu