Pushing the limits for medical image reconstruction on recent standard multicore processors
Erlangen Regional Computing Center, Martensstr. 1, 91058 Erlangen, Germany
arXiv:1104.5243 [cs.PF] (27 Apr 2011)
@article{2011arXiv1104.5243T,
author={Treibig}, J. and {Hager}, G. and {Hofmann}, H.~G. and {Hornegger}, J. and {Wellein}, G.},
title={“{Pushing the limits for medical image reconstruction on recent standard multicore processors}”},
journal={ArXiv e-prints},
archivePrefix={“arXiv”},
eprint={1104.5243},
primaryClass={“cs.PF”},
keywords={Computer Science – Performance},
year={2011},
month={apr},
adsurl={http://adsabs.harvard.edu/abs/2011arXiv1104.5243T},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
Volume reconstruction by backprojection is the computational bottleneck in many interventional clinical computed tomography (CT) applications. Today vendors in this field replace special purpose hardware accelerators by standard hardware like multicore chips and GPGPUs. This paper presents low-level optimizations for the backprojection algorithm, guided by a thorough performance analysis on four generations of Intel multicore processors (Harpertown, Westmere, Nehalem EX, and Sandy Bridge). We choose the RabbitCT benchmark, a standardized testcase well supported in industry, to ensure transparent and comparable results. Our aim is to provide not only the fastest implementation but also compare to performance models and hardware counter data in order to fully understand the results. We separate the influence of algorithmic optimizations, parallelization, SIMD vectorization, and microarchitectural issues on performance and pinpoint problems with current instruction set extensions on standard CPUs (SSE, AVX). Finally we compare our results to the best GPGPU implementations available for this open competition benchmark.
April 29, 2011 by hgpu