Mapping Iterative Medical Imaging Algorithm on Cell Accelerator
Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
International Journal of Biomedical Imaging, Volume 2011, Article ID 843924, 11 pages, 2011
@article{meilian2011mapping,
title={Mapping Iterative Medical Imaging Algorithm on Cell Accelerator},
author={Meilian, X. and Parimala, T.},
journal={International Journal of Biomedical Imaging},
volume={2011},
year={2011},
publisher={Hindawi Publishing Corporation}
}
Algebraic reconstruction techniques require about half the number of projections as that of Fourier backprojection methods, which makes these methods safer in terms of required radiation dose. Algebraic reconstruction technique (ART) and its variant OS-SART (ordered subset simultaneous ART) are techniques that provide faster convergence with comparatively good image quality. However, the prohibitively long processing time of these techniques prevents their adoption in commercial CT machines. Parallel computing is one solution to this problem. With the advent of heterogeneous multicore architectures that exploit data parallel applications, medical imaging algorithms such as OS-SART can be studied to produce increased performance. In this paper, we map OS-SART on cell broadband engine (Cell BE). We effectively use the architectural features of Cell BE to provide an efficient mapping. The Cell BE consists of one powerPC processor element (PPE) and eight SIMD coprocessors known as synergetic processor elements (SPEs). The limited memory storage on each of the SPEs makes the mapping challenging. Therefore, we present optimization techniques to efficiently map the algorithm on the Cell BE for improved performance over CPU version. We compare the performance of our proposed algorithm on Cell BE to that of Sun Fire x4600, a shared memory machine. The Cell BE is five times faster than AMD Opteron dual-core processor. The speedup of the algorithm on Cell BE increases with the increase in the number of SPEs. We also experiment with various parameters, such as number of subsets, number of processing elements, and number of DMA transfers between main memory and local memory, that impact the performance of the algorithm.
October 22, 2011 by hgpu