high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

Jilin Zhang, Enyi Liu, Jian Wan, Yongjian Ren, Miao Yue, Jue Wang

Department of Computer and Technology, Hangzhou Dianzi University, 310018, Hangzhou, Zhejiang, China

Applied Mathematics & Information Sciences, Volume 7, p.473-482, 2013

BibTeX

Download (PDF)

View

Source

1894

views

We are going through the computation from single core to multicore architecture in parallel programming. Graphics Processor Units (GPUs) have recently emerged as outstanding platforms for data parallel applications with regular data access patterns. However, it is still challenging to optimize computations with irregular data access patterns like sparse matrix-vector multiplication (SPMV). SPMV is one of the most important computational kernels in engineering practice and scientific computation. Various data formats to store the sparse matrix have been implemented on GPUs to maximize the performance. In this paper, we propose and evaluate a new implementation of SPMV on GPU based on QCSR storage format which combines the quadtree storage format and CSR format. We also outline some optimization strategies to improve performance. In comparison with previously published implementation, it achieves higher overall performance than BCSR format. The results show that it achieves 1.15 speedup averagely than BCSR format.

Tags: Computer science, CUDA, nVidia, Sparse matrix

January 12, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)