Parametric GPU Code Generation for Affine Loop Programs
Imperial College London
The 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2013
@article{konstantinidis2013parametric,
title={Parametric GPU Code Generation for Affine Loop Programs},
author={Konstantinidis, Athanasios and Kelly, Paul HJ and Ramanujam, J and Sadayappan, P},
year={2013}
}
Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources that has a direct yet hard to estimate impact on performance. This paper develops the first compilation scheme for generating parametrically tiled codes for affine loop programs on GPUs which facilitates run-time exploration of partitioning parameters as a fast and portable way of finding the ones that yield maximum performance. Our approach is based on a parametric tiling scheme for producing wavefronts of parallel rectangular partitions of parametric size and a novel runtime system that manages wavefront execution and local memory usage dynamically through an inspector-executor mechanism. Our experimental evaluation demonstrates the effectiveness of our approach for wavefront as well as rectangularly-parallel partitionings.
October 5, 2013 by hgpu