high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Weizhi Xu, Zhiyong Liu, Dongrui Fan, Shuai Jiao, Xiaochun Ye, Fenglong Song, Chenggang Yan

Key Lab of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences

World Academy of Science, Engineering and Technology, Issue 61, 2012

BibTeX

Download (PDF)

View

Source

2207

views

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA’s SpMV library with as high as 17% speedup.

Tags: Computer science, CUDA, nVidia, nVidia GeForce 9800 GX2, nVidia GeForce GTX 295, Performance, Sparse matrix

March 21, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)