Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA

Jing-yu Cui, Guillem Pratx, Sven Prevrhal, Craig S. Levin
Department of Electrical Engineering, Stanford University, Stanford, California 94305
Medical Physics 38, 6775, 2011


   title={Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA.},

   author={Cui, JY and Pratx, G. and Prevrhal, S. and Levin, C.S.},

   journal={Medical physics},






Download Download (PDF)   View View   Source Source   



PURPOSE: List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes it challenging to incorporate accurate system modeling. METHODS: The authors present a novel formulation for computing line projection operations on graphics processing units (GPUs) using the compute unified device architecture (CUDA) framework, and apply the formulation to list-mode ordered-subsets expectation maximization (OSEM) image reconstruction. Our method overcomes well-known GPU challenges such as divergence of compute threads, limited bandwidth of global memory, and limited size of shared memory, while exploiting GPU capabilities such as fast access to shared memory and efficient linear interpolation of texture memory. Execution time comparison and image quality analysis of the GPU-CUDA method and the central processing unit (CPU) method are performed on several data sets acquired on a preclinical scanner and a clinical ToF scanner. RESULTS: When applied to line projection operations for non-ToF list-mode PET, this new GPU-CUDA method is >200 times faster than a single-threaded reference CPU implementation. For ToF reconstruction, we exploit a ToF-specific optimization to improve the efficiency of our parallel processing method, resulting in GPU reconstruction >300 times faster than the CPU counterpart. For a typical whole-body scan with 75x75x26 image matrix, 40.7 million LORs, 33 subsets, and 3 iterations, the overall processing time is 7.7 s for GPU and 42 min for a single-threaded CPU. Image quality and accuracy are preserved for multiple imaging configurations and reconstruction parameters, with normalized root mean squared (RMS) deviation less than 1% between CPU and GPU-generated images for all cases. CONCLUSIONS: A list-mode ToF OSEM library was developed on the GPU-CUDA platform. Our studies show that the GPU reformulation is considerably faster than a single-threaded reference CPU method especially for ToF processing, while producing virtually identical images. This new method can be easily adapted to enable more advanced algorithms for high resolution PET reconstruction based on additional information such as depth of interaction (DoI), photon energy, and point spread functions (PSFs).
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: