high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accelerating GPU kernels for dense linear algebra

Accelerating GPU kernels for dense linear algebra

Rajib Nath, Stanimire Tomov, Jack Dongarra

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville

High Performance Computing for Computational Science – VECPAR 2010, Lecture Notes in Computer Science, 2011, Volume 6449/2011, 83-92

DOI:10.1007/978-3-642-19328-6_10

@article{nath2011accelerating,

title={Accelerating GPU kernels for dense linear algebra},

author={Nath, R. and Tomov, S. and Dongarra, J.},

journal={High Performance Computing for Computational Science–VECPAR 2010},

pages={83–92},

year={2011},

publisher={Springer}

}

Download (PDF)

View

Source

1487

views

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting – a set of GPU specific optimization techniques – allows us to easily remove performance oscillations associated with problem dimensions not divisible by fixed blocking sizes. For example, applied to the matrix-matrix multiplication routines, depending on the hardware configuration and routine parameters, this can lead to two times faster algorithms. Similarly, the matrix-vector multiplication can be accelerated more than two times in both single and double precision arithmetic. Additionally, GPU specific acceleration techniques are applied to develop new kernels (e.g. syrk, symv) that are up to 20! faster than the currently available kernels. We present these kernels and also show their acceleration e!ect to higher level dense linear algebra routines. The accelerated kernels are now freely available through the MAGMA BLAS library.

Tags: BLAS, Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 280

June 4, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Accelerating GPU kernels for dense linear algebra

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Accelerating GPU kernels for dense linear algebra

Share this:

Recent source codes

Most viewed papers (last 30 days)