Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging
Department of Computer Science, University of Erlangen-Nuremberg, Germany
11th International Symposium on Parallel and Distributed Computing (ISPDC), 2012
@conference{membarth2012aoi,
author={Membarth, Richard and Hannig, Frank and Teich, J{"u}rgen and K{"o}rner, Mario and Eckert, Wieland},
address={Munich, Germany},
booktitle={Proceedings of the 11th International Symposium on Parallel and Distributed Computing (ISPDC)},
title={Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging},
year={2012},
month={jun},
date={2012-06-25/2012-06-29},
organization={IEEE}
}
An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited and performance is predetermined by memory bandwidth. As a remedy, this paper investigates the optimal utilization of available memory bandwidth by means of increasing in-flight memory transactions. Instead of doing this manually for different GPU accelerators, the required CUDA and OpenCL code is automatically generated from descriptions in a Domain-Specific Language (DSL) for the considered application domain. Moreover, the DSL is extended to also support global reduction operators. We show that the generated target-specific code improves bandwidth utilization for memory-bound kernels significantly. Moreover, competitive performance compared to the GPU back end of the widely used image processing library OpenCV can be achieved.
July 3, 2012 by hgpu