High-Performance Code Generation for Stencil Computations on GPU Architectures
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210
ACM International Conference on Supercomputing (ICS’12), 2012
@article{holewinski2012high,
title={High-Performance Code Generation for Stencil Computations on GPU Architectures},
author={Holewinski, J. and Pouchet, L.N. and Sadayappan, P.},
year={2012}
}
Stencil computations arise in many scientific computing domains, and often represent time-critical portions of applications. There is significant interest in offloading these computations to high-performance devices such as GPU accelerators, but these architectures offer challenges for developers and compilers alike. Stencil computations in particular require careful attention to off-chip memory access and the balancing of work among compute units in GPU devices. In this paper, we present a code generation scheme for stencil computations on GPU accelerators, which optimizes the code by trading an increase in the computational workload for a decrease in the required global memory bandwidth. We develop compiler algorithms for automatic generation of efficient, time-tiled stencil code for GPU accelerators from a high-level description of the stencil operation. We show that the code generation scheme can achieve high performance on a range of GPU architectures, including both nVidia and AMD devices.
April 28, 2012 by hgpu