Generating Device-specific GPU code for Local Operators in Medical Imaging
Department of Computer Science, University of Erlangen-Nuremberg, Germany
26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2012
@article{membarth2012generating,
title={Generating Device-specific GPU code for Local Operators in Medical Imaging},
author={Membarth, R. and Hannig, F. and Teich, J. and K{"o}rner, M. and Eckert, W. and Siemens Healthcare Sector, HIM},
year={2012}
}
To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domainspecific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler translates this high-level description into low-level CUDA and OpenCL code with automatic support for boundary handling and filter masks. Taking the annotated metadata and the characteristics of the parallel GPU execution model into account, two-layered parallel implementations-utilizing SPMD and MPMD parallelism are generated. An abstract hardware model of graphics card architectures allows to model GPUs of multiple vendors like AMD and NVIDIA, and to generate device-specific code for multiple targets. It is shown that the generated code is faster than manual implementations and those relying on hardware support for boundary handling. Implementations from RapidMind, a commercial framework for GPU programming, are outperformed and similar results achieved compared to the GPU backend of the widely used image processing library OpenCV.
June 1, 2012 by hgpu