high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

Tokyo Institute of Technology, Tokyo, Japan

Procedia Computer Science, Volume 80, Pages 131-142, 2016

DOI:10.1016/j.procs.2016.05.304

@article{nagasaka2016adaptive,

title={Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU},

author={Nagasaka, Yusuke and Nukada, Akira and Matsuoka, Satoshi},

journal={Procedia Computer Science},

volume={80},

pages={131–142},

year={2016},

publisher={Elsevier}

}

Download (PDF)

View

Source

2007

views

Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes further performance degradation. We propose a new sparse matrix format called the Adaptive Multi-level Blocking (AMB) format, which aggressively reduces the memory traffic in SpMV computation to improve performance. By several optimization techniques such as division and blocking of the given matrix, the column indices are compressed and the reusability of input vector element in the cache is highly improved. An auto-tuning mechanism determines the best set of parameters for each matrix data by estimating the memory traffic and predicting the performance of a given SpMV computation. For 32 matrix datasets taken from the Sparse Matrix Collection collected by the University of Florida, AMB format achieves speedups of up to x2.92 compared to NVIDIA’s cuSparse library and up to x1.40 compared to yaSpMV, which was recently proposed and has been the best known library to date for fast SpMV computation.

Tags: Computer science, CUDA, Linear Algebra, nVidia, Sparse matrix, Tesla K20

June 9, 2016 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)