high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » High-performance cone beam reconstruction using CUDA compatible GPUs

High-performance cone beam reconstruction using CUDA compatible GPUs

Yusuke Okitsu, Fumihiko Ino, Kenichi Hagihara

Graduate School of Information Science and Technology, Osaka University, 1-5 Yamada-oka, Suita, Osaka 565-0871, Japan

Parallel Computing, Vol. 36, No. 2-3. (04 February 2010), pp. 129-141.

DOI:10.1016/j.parco.2010.01.004

@article{okitsu2010high,

title={High-performance cone beam reconstruction using CUDA compatible GPUs},

author={Okitsu, Y. and Ino, F. and Hagihara, K.},

journal={Parallel Computing},

volume={36},

number={2-3},

pages={129–141},

issn={0167-8191},

year={2010},

publisher={Elsevier}

}

Download (PDF)

View

Source

1559

views

Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 512^3-voxel volume from 360 512^2-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 1024^3-voxel volume.

Tags: CUDA, Image processing, Image reconstruction, nVidia, nVidia GeForce 8800 GTX, Tesla C870

November 8, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

High-performance cone beam reconstruction using CUDA compatible GPUs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

High-performance cone beam reconstruction using CUDA compatible GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)