high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Cache-efficient numerical algorithms using graphics hardware

Cache-efficient numerical algorithms using graphics hardware

Naga K. Govindaraju, Dinesh Manocha

Microsoft Corporation

Parallel Computing, Vol. 33, No. 10-11. (2007), pp. 663-684

DOI:10.1016/j.parco.2007.09.006

@article{govindaraju2007cache,

title={Cache-efficient numerical algorithms using graphics hardware},

author={Govindaraju, N.K. and Manocha, D.},

journal={Parallel Computing},

volume={33},

number={10-11},

pages={663–684},

issn={0167-8191},

year={2007},

publisher={Elsevier}

}

Download (PDF)

View

Source

1493

views

We present cache-efficient algorithms for scientific computations using graphics processing units (GPUs). Our approach is based on mapping the nested loops in the numerical algorithms to the texture mapping hardware and efficiently utilizing GPU caches. This mapping exploits the inherent parallelism, pipelining and high memory bandwidth on GPUs. We further improve the performance of numerical algorithms by accounting for the same relative memory address accesses performed at data elements in nested loops. Based on the similarity of memory accesses performed at the data elements in the input array, we decompose the input arrays into sub-arrays with similar memory access patterns and execute on the sub-arrays for faster execution. Our approach achieves high memory performance on GPUs by tiling the computation and thereby improving the cache-efficiency. Overall, our formulation for GPU-based algorithms extends the current graphics runtime APIs without exposing the underlying hardware complexity to the programmer. This makes it possible to achieve portability and higher performance across different GPUs. We use this approach to improve the performance of GPU-based sorting, fast Fourier transform and dense matrix multiplication algorithms. We also compare our results with prior GPU-based and CPU-based implementations on high-end processors. In practice, we observe 2

Tags: Computer science, Memory model, nVidia, nVidia GeForce 6800 Ultra, nVidia GeForce 7900 GTX, nVidia GeForce 8800 GTX, OpenGL

December 5, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Cache-efficient numerical algorithms using graphics hardware

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Cache-efficient numerical algorithms using graphics hardware

Share this:

Recent source codes

Most viewed papers (last 30 days)