high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Generating Device-specific GPU code for Local Operators in Medical Imaging

Generating Device-specific GPU code for Local Operators in Medical Imaging

Richard Membarth, Frank Hannig, Jurgen Teich, Mario Korner, Wieland Eckert

Department of Computer Science, University of Erlangen-Nuremberg, Germany

26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2012

DOI:10.1109/IPDPS.2012.59

BibTeX

Download (PDF)

View

Source

2329

views

To cope with the complexity of programming GPU accelerators for medical imaging computations, we developed a framework to describe image processing kernels in a domainspecific language, which is embedded into C++. The description uses decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access patterns of kernels. A source-to-source compiler translates this high-level description into low-level CUDA and OpenCL code with automatic support for boundary handling and filter masks. Taking the annotated metadata and the characteristics of the parallel GPU execution model into account, two-layered parallel implementations-utilizing SPMD and MPMD parallelism are generated. An abstract hardware model of graphics card architectures allows to model GPUs of multiple vendors like AMD and NVIDIA, and to generate device-specific code for multiple targets. It is shown that the generated code is faster than manual implementations and those relying on hardware support for boundary handling. Implementations from RapidMind, a commercial framework for GPU programming, are outperformed and similar results achieved compared to the GPU backend of the widely used image processing library OpenCV.

Tags: ATI, ATI Radeon HD 5870, ATI Radeon HD 6970, Code generation, CUDA, Image processing, Medicine, nVidia, nVidia Quadro FX 5800, OpenCL, RapidMind, Tesla C2050

June 1, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Generating Device-specific GPU code for Local Operators in Medical Imaging

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Generating Device-specific GPU code for Local Operators in Medical Imaging

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)