GPU acceleration of MOLAR for HRRT List-Mode OSEM reconstructions
Nat. Inst. of Health, Bethesda
IEEE Nuclear Science Symposium Conference Record, 2007. NSS ’07
@conference{barker2007gpu,
title={GPU acceleration of MOLAR for HRRT List-Mode OSEM reconstructions},
author={Barker, W.C. and Thada, S.},
booktitle={Nuclear Science Symposium Conference Record, 2007. NSS’07. IEEE},
volume={4},
pages={3004–3008},
issn={1082-3654},
year={2007},
organization={IEEE}
}
The Siemens ECAT HRRT PET scanner has the potential to produce images of the human brain with spatial resolution better than 3 mm. MOLAR (a motion-compensation OSEM List-mode Algorithm for Resolution-recovery) was developed to provide reconstructions of HRRT data with the best possible accuracy and precision. However, a computer cluster is required to generate reconstructions in a reasonable amount of time. Strategies for computational efficiency have already been implemented in MOLAR but room for improvement remains. In this study we have begun the process of converting time- consuming components of MOLAR to parallelized code that runs on commodity graphics cards (GPUs) with much faster turnaround. We evaluated the performance of list-mode event forward projections and component-based normalization factor calculations, and we confirmed the numerical accuracy of images reconstructed with GPU-assisted code running on an HP xw8400 workstation with an NVIDIA Quadro FX 4600 graphics card. We evaluated simulated data projected through a 128times128times128 image volume that included the direct calculation of a gaussian resolution function for simulated list-mode events. This was done using the Cg and CUDA programming APIs for implementation comparison. Both GPU versions ran up to 100 times faster than the CPU-only code. The CUDA version showed some improvement over Cg and was easier to program. We also examined measured Ge-68 phantom data projected through a 256times256times207 image volume with resolution functions obtained through array lookup rather than by direct calculation. The GPU-assisted code was observed to be up to 14 times faster than the CPU-only code, particularly when one million or more events were processed. Normalization processing was found to be up to 36 times faster. However, speedup decreased to a factor of 3 when disk I/O became dominant as more than one billion events were processed. We anticipate further acceleration of MOLAR as we convert other -components to GPU-assisted code, in particular backprojection and scatter correction. (Backprojection is on hold until a next generation GPU which has atomic write capability becomes available.)
April 3, 2011 by hgpu