high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Study of LU Decomposition on the Programmable GPU

Performance Study of LU Decomposition on the Programmable GPU

Fumihiko Ino, Manabu Matsui, Keigo Goda, Kenichi Hagihara

Graduate School of Information Science and Technology, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan

High Performance Computing – HIPC 2005, Lecture Notes in Computer Science, 2005, Volume 3769/2005, p.83-94

DOI:10.1007/11602569_13

@article{ino2005performance,

title={Performance study of LU decomposition on the programmable GPU},

author={Ino, F. and Matsui, M. and Goda, K. and Hagihara, K.},

journal={High Performance Computing–HiPC 2005},

pages={83–94},

year={2005},

publisher={Springer}

}

Download (PDF)

View

Source

1664

views

With the increasing programmability of graphics processing units (GPUs), these units are emerging as an attractive computing platform not only for traditional graphics computation but also for general-purpose computation. In this paper, to study the performance of programmable GPUs, we describe the design and implementation of LU decomposition as an example of numerical computation. To achieve this, we have developed and evaluated some methods with different implementation approaches in terms of (a) loop processing, (b) branch processing, and (c) vector processing. The experimental results give four important points: (1) dependent loops must be implemented through the use of a render texture in order to avoid copies in the video random access memory (VRAM); (2) in most cases, branch processing can be efficiently handled by the CPU rather than the GPU; (3) as Fatahalian et al. state for matrix multiplication, we find that GPUs require higher VRAM cache bandwidth in order to provide full performance for LU decomposition; and (4) decomposition results obtained by GPUs usually differ from those by CPUs, mainly due to the floating-point division error that increases the numerical error with the progress of decomposition.

Tags: Cg, Computer science, Linear Algebra, nVidia, nVidia GeForce FX 5900 Ultra, nVidia Quadro FX 3400, OpenGL

December 26, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Performance Study of LU Decomposition on the Programmable GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Performance Study of LU Decomposition on the Programmable GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)