high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » GPU-accelererated regularisation of large diffusion-tensor volumes

GPU-accelererated regularisation of large diffusion-tensor volumes

Tuomo Valkonen, Manfred Liebmann

Institute for Mathematics and Scientific Computing, University of Graz, Austria

Computing, 2013

DOI:10.1007/s00607-012-0277-x

@article{valkonen2013gpu,

year={2013},

issn={0010-485X},

journal={Computing},

doi={10.1007/s00607-012-0277-x},

title={GPU-accelererated regularisation of large diffusion-tensor volumes},

url={http://dx.doi.org/10.1007/s00607-012-0277-x},

publisher={Springer Vienna},

keywords={DTI; Regularisation; Medical imaging; GPU; Open ACC; 92C55; 94A08; 26B30; 49M29},

author={Valkonen, Tuomo and Liebmann, Manfred},

pages={1-14},

language={English}

}

Download (PDF)

View

Source

2509

views

We discuss the benefits, difficulties, and performance of a GPU implementation of the Chambolle-Pock algorithm for TGV (total generalised variation) denoising of medical diffusion tensor images. Whereas we have previously studied the denoising of 2D slices of $2 times 2$ and $3 times 3$ tensors, attaining satisfactory performance on a normal CPU, here we concentrate on full 3D volumes of data, where each 3D voxel consists of a symmetric $3 times 3$ tensor. One of the major computational bottle-necks in the Chambolle-Pock algorithm for these problems is that on each iteration at each voxel of the data set, a tensor potentially needs to be projected to the positive semi-definite cone. This in practise demands the QR algorithm, as explicit solutions are not numerically stable. For a $128 times 128 times 128$ data set, for example, the count is 2 megavoxels, which lends itself to massively parallel GPU implementation. Further performance enhancements are obtained by parallelising basic arithmetic operations and differentiation. Since we use the relatively recent OpenACC standard for the GPU implementation, the article includes a study and critique of its applicability.

Tags: Algorithms, Diffusion tensor, Image processing, Medicine, nVidia, nVidia GeForce GTX 480, OpenACC, Tesla C2070

January 18, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

GPU-accelererated regularisation of large diffusion-tensor volumes

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

GPU-accelererated regularisation of large diffusion-tensor volumes

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)