Breaking the GPU programming barrier with the auto-parallelising SAC compiler

hgpu.org » Programming » Algorithms » Breaking the GPU programming barrier with the auto-parallelising SAC compiler

Breaking the GPU programming barrier with the auto-parallelising SAC compiler

Jing Guo, Jeyarajan Thiyagalingam, Sven-Bodo Scholz

University of Hertfordshire, Hatfield, United Kingdom

Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP ’11, 2011

DOI:10.1145/1926354.1926359

@inproceedings{guo2011breaking,

title={Breaking the GPU programming barrier with the auto-parallelising SAC compiler},

author={Guo, J. and Thiyagalingam, J. and Scholz, S.B.},

booktitle={Proceedings of the sixth workshop on Declarative aspects of multicore programming},

pages={15–24},

year={2011},

organization={ACM}

}

Source

2248

views

Over recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices. Directive-based approaches such as hiCUDA or OpenMP-variants offer further improvements but have not eliminated the need for the expertise on these complex architectures. Similarly, special purpose programming languages such as Microsoft’s Accelerator try to lower the barrier further. They provide the programmer with a special form of GPU data structures and operations on them which are then compiled into GPU code. In this paper, we take this trend towards a completely implicit, high-level approach yet another step further. We generate CUDA code from a MATLAB-like high level functional array programming language, Single Assignment C (SaC). To do so, we identify which data structures and operations can be successfully mapped on GPUs and transform existing programs accordingly. This paper presents the first runtime results from our GPU backend and it presents the basic set of GPU-specific program optimisations that turned out to be essential. Despite our high-level program specifications, we show that for a number of benchmarks speedups between a factor of 5 and 50 can be achieved through our parallelising compiler.

Tags: Algorithms, Compilers, Computer science, CUDA, nVidia, Performance, Programming Languages

August 21, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org