Developing a High Performance GPGPU Compiler Using Cetus
North Carolina State University
Cetus Users and Compiler Infrastructure Workshop, International Conference on Parallel Architectures and Compilation Techniques (PACT’11), 2011
@article{yang2011developing,
title={Developing a High Performance GPGPU Compiler Using Cetus},
author={Yang, Yi and Zhou, H.},
booktitle={Cetus Users and Compiler Infrastructure Workshop, International Conference on Parallel Architectures and Compilation Techniques (PACT’11)},
year={2011}
}
In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework. The input to our compiler is a naive GPU kernel procedure, which is functionally correct but without any consideration for performance optimization. Our compiler applies a set of optimization techniques to the naive kernel and generates the optimized GPU kernel. The implementation of our compiler is facilitated with the Cetus infrastructure. The code transformation in the Cetus compiler framework is called a pass. We classify all the passes used in our work into two categories: functional passes and optimization passes. The functional passes translate input kernels into desired intermediate representation, which can clearly represent memory access patterns and thread configurations. The CUDA language support pass is derived from MCUDA. A series of optimization passes improve the performance of the kernels by adapting the kernels to the GPGPU architecture. Our experiments show that the optimized code achieves very high performance, either superior or very close to highly fine-tuned libraries.
October 13, 2011 by hgpu