high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Alexander Monakov, Anton Lokhmotov, Arutyun Avetisyan

Institute for Systems Programming of RAS, 25 Solzhenitsyna street, Moscow, 109004, Russian Federation

In Proceedings of the 5th International Conferences on High Performance Embedded Architectures and Compilers(HiPEAC 2010), Vol. 5952 (2010), pp. 111-125

DOI:10.1007/978-3-642-11515-8_10

@article{monakov2010automatically,

title={Automatically tuning sparse matrix-vector multiplication for GPU architectures},

author={Monakov, A. and Lokhmotov, A. and Avetisyan, A.},

journal={High Performance Embedded Architectures and Compilers},

pages={111–125},

year={2010},

publisher={Springer}

}

Source

1450

views

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.

Tags: Computer science, Partial differential equations, PDEs, Sparse matrix

April 21, 2011 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Recent source codes

Kokkidio: Kokkos+Eigen - write fast, readable multi-platform code

LLaMA-Inference-Bench

MATAR

Over-synchronization in GPU Programs

profile_util: simple C++ library to report memory, timing, thread affinity

MaC: Mamba CPU performance predictor

Mixed-precision finite element kernels

SHMT: The Simultaneous and Heterogenous Multithreading project

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

EnergyUCB-Bandit

Most viewed papers (last 30 days)

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)