Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture
Siemens AG, Healthcare Sector, Imaging & IT Division, P.O. Box 1266, D-91294 Forchheim, Germany
The 12th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and NUclear Medicine, 2013
@inproceedings{univis91372382,
author={Timo Zinsser and Benjamin Keck},
editor={Fully3D committee},
url={http://www5.informatik.uni-erlangen.de/Forschung/Publikationen/2013/Zinsser13-SPO.pdf},
location={Lake Tahoe, CA, USA},
booktitle={Proceedings of the 12th Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine},
title={Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture},
pages={225–228},
year={2013},
bibsource={UnivIS, http://univis.uni-erlangen.de/prg?search=publications&id=91372382&show=elong}
}
Filtered back-projection algorithms are widely used for the reconstruction of volumetric data from cone-beam projections in interventional C-arm computed tomography. Furthermore, general-purpose GPUs have become a popular tool for accelerating the reconstruction during time-critical clinical procedures. In this work, we focus on the systematic performance optimization of cone-beam back-projection on the latest architecture of CUDA-enabled GPUs. Our optimization approach is based on the identification of the major performance bottleneck through the analysis of specifically modified kernels. Our main contribution is a smart restructuring of the backprojection algorithm that facilitates the simultaneous processing of a large number of projections and improves the hit rate of the texture cache at the same time. We use the well-known RabbitCT benchmark to demonstrate the outstanding performance of our implementation on a single Kepler-based GeForce GTX 680 GPU. Our implementation performs the back-projection of 496 input projections onto a cubic 5123 volume in less than one second, which is three times as fast as the best competing implementation. Our back-projection implementation is also able to reconstruct a cubic 10243 volume in about six seconds, which is six times as fast as the best competing implementation known to us.
July 27, 2013 by hgpu