high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Steven Dalton, Nathan Bell, Luke N. Olson

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801

University of Illinois at Urbana-Champaign, Technical Report, 2013

@article{dalton2013optimizing,

title={Optimizing Sparse Matrix-Matrix Multiplication for the GPU},

author={Dalton, Steven and Bell, Nathan and Olson, Luke N},

journal={Matrix},

volume={3},

year={2013}

}

Download (PDF)

View

Source

1872

views

Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpMM operation into three, highly-parallel phases: expansion, sorting, and compression, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimizations lead to substantial efficiencies for a SpMM product.

Tags: Computer science, CUDA, Matrix multiplication, nVidia, Sorting, Sparse matrix, Tesla C2075

April 7, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Optimizing Sparse Matrix-Matrix Multiplication for the GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)