Data parallel loop statement extension to CUDA: GpuC

hgpu.org » Applications » Computer science » Data parallel loop statement extension to CUDA: GpuC

Data parallel loop statement extension to CUDA: GpuC

Zeki Bozkus, Rajeev Thakur, William Gropp, Ewing Lusk

Department of Computer Engineering, Kadir Has University, Istanbul, 34083 Turkey

Symposium on Application Accelerators in High Performance Computing, 2009 (SAAHPC’09)

@article{bozkus2009data,

title={Data parallel loop statement extension to CUDA: GpuC},

author={Bozkus, Z. and Thakur, R. and Gropp, W. and Lusk, E.},

booktitle={Application Accelerators in High Performance Computing, 2009 Symposium, Papers},

year={2009}

}

Download (PDF)

View

Source

1882

views

In recent years, Graphics Processing Units (GPUs) have emerged as a powerful accelerator for general-purpose computations. GPUs are attached to every modern desktop and laptop host CPU as graphics accelerators. GPUs have over a hundred cores with lots of parallelism. Initially, they were used only for graphics applications such as image processing and video games. However, many other applications are starting to be ported to GPUs to extend the power of the GPU beyond graphics. Current approaches to program GPUs are still relatively lowlevel programming models such as Compute Unified Device Architecture (CUDA), a programming model from NVIDIA, and Open Compute Language (OpenCL), created by Apple in cooperation with others. These two programming models have all the complexity of parallel programming such as breaking up the task into smaller tasks, assigning the smaller tasks to multiple CPUs to work on simultaneously, and coordinating the CPUs. There is a growing need to lower the complexity of programming these devices. In this paper, we propose a data-parallel loop (forall) extension to the CUDA programming model. We describe our prototype compiler named GpuC. The compiler takes dataparallel forall loops along with the other CUDA statements as input and generates CUDA code as output. We present compilation steps, optimizations, and code generations. We identified several key optimizations for the compiler. We present experimental results from four NAS benchmarks to show performance gains.

Tags: Compilers, Computer science, CUDA, High-level Languages, nVidia, OpenMP, Optimization

February 21, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org