high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Improving Cache Locality for Ray Casting with CUDA

Improving Cache Locality for Ray Casting with CUDA

Yuki Sugimoto, Fumihiko Ino, Kenichi Hagihara

Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan

25th International Conference on Architecture of Computing Systems Workshops (ARCS 2012 Workshops), 2012

BibTeX

Download (PDF)

View

Source

2162

views

In this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of thread blocks (TBs) such that each warp, which is a series of 32 threads simultaneously processed on the GPU, can achieve high data locality for specific viewpoints. The objective of this selection is to allow every warp rather than every thread to access data with a small stride, because the GPU executes multiple threads at the same time. In experiments using a GeForce GTX 480 card (i.e., the latest Fermi architecture), we find that the speedup of our method ranges from a factor of 1.0 to that of 4.0, depending on viewpoints. We think that optimizing the shape of TBs is important to achieve more cache hits in the highly-threaded CUDA hardware.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 480, Rendering

March 18, 2012 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Improving Cache Locality for Ray Casting with CUDA

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Improving Cache Locality for Ray Casting with CUDA

Share this:

Recent source codes

Most viewed papers (last 30 days)