Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs
Department of Informatics and Mathematics, Chair for Programming, University of Passau
University of Passau, 2012
@phdthesis{bayerl2012performance,
title={Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs},
author={Bayerl, Franz Xaver},
year={2012}
}
General-Purpose computing on GPUs (GPGPU) provides the opportunity to utilize the tremendous computational power of graphics accelerators for a wider set of problems. These devices leverage massive parallelism to achieve high performance, however, creating highly parallelized code which is optimized for the characteristics of GPUs is no simple task. The polyhedron model is used successfully to parallelize code in many domains. We use polyhedral techniques to generate high performance CUDA code for specific problems on specific architectures and evaluate if automatically generated code is able to reach or exceed the performance of manually optimized code. In this thesis we focus our research on tensor contractions and General Matrix-Matrix Multiplication (GEMM). We identify aspects that are crucial for high performance and deduce strategies for automatic code generation. These strategies are defined as transformations on polyhedral descriptions. Our experiments suggest that polyhedral code generators are able to generate Compute-Unified Device Architecture (CUDA) code that achieves the same level of performance as manually optimized codes.
April 19, 2012 by hgpu