high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

Wei Cao, Lu Yao, Zongzhe Li, Yongxian Wang, Zhenghua Wang

Nat. Key Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China

International Conference on Computer Application and System Modeling (ICCASM), 2010

DOI:10.1109/ICCASM.2010.5623237

@inproceedings{cao2010implementing,

title={Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format},

author={Cao, W. and Yao, L. and Li, Z. and Wang, Y. and Wang, Z.},

booktitle={Computer Application and System Modeling (ICCASM), 2010 International Conference on},

volume={11},

pages={V11–161},

organization={IEEE},

year={2010}

}

Source

1515

views

The Sparse Matrix-Vector product (SpMV) is a key operation in engineering and scientific computing. Methods for efficiently implementing it in parallel are critical to the performance of many applications. Modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA’s CUDA, have gained interest as a viable architecture for data-parallel general purpose computations. Currently, SpMV implementations using CUDA based on common sparse matrix format have already appeared. Among them, the performance of implementation based on ELLPACK-R format is the best. However, in this implementation, when the maximum number of nonzeros per row does substantially differ from the average, thread is suffering from load imbalance. This paper proposes a new matrix storage format called ELLPACK-RP, which combines ELLPACK-R format with JAD format, and implements the SpMV using CUDA based on it. The result proves that it can decrease the load imbalance and improve the SpMV performance efficiently.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, Sparse matrix

June 20, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

Share this:

Recent source codes

Most viewed papers (last 30 days)