Effectiveness of program transformations and compilers for directive-based GPU programming models
University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign, 2013
@article{padua2013effectiveness,
title={Effectiveness of program transformations and compilers for directive-based GPU programming models},
author={Padua, D.A. and Garzaran, M.J.},
year={2013}
}
Accelerator devices like the General Purpose Graphics Computing Units (GPGPUs) play an important role in enhancing the performance of many contemporary scientific applications. However, programming GPUs using languages like C for CUDA or OpenCL requires relatively high investment of time and the resulting programs are often fine-tuned to perform well only on a particular device. The alternative is to program in a conventional and machine independent notation and use compilers to transform CPU programs to heterogeneous form either automatically or relying on directives from the programmer. These compilers can offer the benefits of code portability and increased programmer productivity without imposing much penalty on performance. This thesis evaluates the quality of early versions of two compilers – the PGI compiler and the Cray compiler, as tools for translation of C programs written for single or multicore CPUs to heterogeneous programs that execute on NVIDIA’s GPUs. In our methodology, we apply a sequence of transformations to CPU programs that allow the compilers to form GPU kernels from loops, and then we analze the impact of each transformation on the performance of compiled programs. Our further evaluation of the performance of 15 application kernels shows that the executables produced by the PGI and Cray compilers can achieve reasonable, and in some cases equivalent performance as compared to hand-written OpenMP and CUDA codes. Our results also show that the Cray compiler managed to produce faster executables for more applications than the PGI compiler. We show that for a heterogeneous program to execute faster, the traditional analyses and optimizations needed for producing a good sequential program are equally if not more valuable compared to those needed to produce a good GPU kernel. At the end of this thesis, we also provide a set of guidelines to programmers for extracting good performance from the heterogeneous executables produced by the PGI and Cray compilers.
February 9, 2013 by hgpu