Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
The Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts
Northeastern University, 2012
@phdthesis{moore2012kernel,
title={Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)},
author={Moore, Nicholas John},
school={NORTHEASTERN UNIVERSITY},
year={2012}
}
Graphics processing units (GPUs) offer significant speedups over CPUs for certain classes of applications. However, maximizing GPU performance can be a difficult task due to the relatively high programming complexity as well as frequent hardware changes. Important performance optimizations are applied by the GPU compiler ahead of time and require fixed parameter values at compile time. As a result, many GPU codes offer minimum levels of adaptability to variations among problem instances and hardware configurations. These factors limit code reuse and the applicability of GPU computing to a wider variety of problems. This dissertation introduces GPGPU kernel specialization, a technique that can be used to describe highly adaptable kernels that work across different generations of GPUs with high performance. With kernel specialization, customized GPU kernels incorporating both problem- and implementation-specific parameters are compiled for each problem and hardware instance combination. This dissertation explores the implementation and parameterization of three real world applications targeting two generations of NVIDIA CUDA-enabled GPUs and utilizing kernel specialization: large template matching, particle image velocimetry, and cone-beam image reconstruction via backprojection. Starting with high performance adaptable GPU kernels that compare favorably to multi-threaded and FPGA-based reference implementations, kernel specialization is shown to maintain adaptability while providing performance improvements in terms of speedups and reduction in per-thread register usage. The proposed technique offers productivity benefits, the ability to adjust parameters that otherwise must be static, and a means to increase the complexity and parameterizability of GPGPU implementations beyond what would otherwise be feasible on current GPU hardware.
October 26, 2012 by hgpu