high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Improving Cache Locality for Ray Casting with CUDA

Improving Cache Locality for Ray Casting with CUDA

Yuki Sugimoto, Fumihiko Ino, Kenichi Hagihara

Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan

25th International Conference on Architecture of Computing Systems Workshops (ARCS 2012 Workshops), 2012

@article{sugimoto2012improving,

title={Improving Cache Locality for Ray Casting with CUDA},

author={Sugimoto, Y. and Ino, F. and Hagihara, K.},

year={2012}

}

Download (PDF)

View

Source

1804

views

In this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of thread blocks (TBs) such that each warp, which is a series of 32 threads simultaneously processed on the GPU, can achieve high data locality for specific viewpoints. The objective of this selection is to allow every warp rather than every thread to access data with a small stride, because the GPU executes multiple threads at the same time. In experiments using a GeForce GTX 480 card (i.e., the latest Fermi architecture), we find that the speedup of our method ranges from a factor of 1.0 to that of 4.0, depending on viewpoints. We think that optimizing the shape of TBs is important to achieve more cache hits in the highly-threaded CUDA hardware.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 480, Rendering

March 18, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Improving Cache Locality for Ray Casting with CUDA

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Improving Cache Locality for Ray Casting with CUDA

Share this:

Recent source codes

Most viewed papers (last 30 days)