## GPU-accelererated regularisation of large diffusion-tensor volumes

Institute for Mathematics and Scientific Computing, University of Graz, Austria

Computing, 2013

@article{valkonen2013gpu,

year={2013},

issn={0010-485X},

journal={Computing},

doi={10.1007/s00607-012-0277-x},

title={GPU-accelererated regularisation of large diffusion-tensor volumes},

url={http://dx.doi.org/10.1007/s00607-012-0277-x},

publisher={Springer Vienna},

keywords={DTI; Regularisation; Medical imaging; GPU; Open ACC; 92C55; 94A08; 26B30; 49M29},

author={Valkonen, Tuomo and Liebmann, Manfred},

pages={1-14},

language={English}

}

We discuss the benefits, difficulties, and performance of a GPU implementation of the Chambolle-Pock algorithm for TGV (total generalised variation) denoising of medical diffusion tensor images. Whereas we have previously studied the denoising of 2D slices of $2 times 2$ and $3 times 3$ tensors, attaining satisfactory performance on a normal CPU, here we concentrate on full 3D volumes of data, where each 3D voxel consists of a symmetric $3 times 3$ tensor. One of the major computational bottle-necks in the Chambolle-Pock algorithm for these problems is that on each iteration at each voxel of the data set, a tensor potentially needs to be projected to the positive semi-definite cone. This in practise demands the QR algorithm, as explicit solutions are not numerically stable. For a $128 times 128 times 128$ data set, for example, the count is 2 megavoxels, which lends itself to massively parallel GPU implementation. Further performance enhancements are obtained by parallelising basic arithmetic operations and differentiation. Since we use the relatively recent OpenACC standard for the GPU implementation, the article includes a study and critique of its applicability.

January 18, 2013 by hgpu