A Compiler for Throughput Optimization of Graph Algorithms on GPUs
The University of Texas at Austin, USA
OOPSLA ’16, 2016
@article{pai2016compiler,
title={A Compiler for Throughput Optimization of Graph Algorithms on GPUs},
author={Pai, Sreepathi and Pingali, Keshav},
year={2016}
}
Writing high-performance GPU implementations of graph algorithms can be challenging. In this paper, we argue that three optimizations called throughput optimizations are key to high-performance for this application class. These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand. To address this problem, we have implemented these optimizations in a compiler that produces CUDA code from an intermediate-level program representation called IrGL. Compared to state-of-the-art handwritten CUDA implementations of eight graph applications, code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never more than 30% slower for the others. Throughput optimizations contribute an improvement up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.
September 20, 2016 by hgpu