high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

Ping Guo, He Huang, Qichang Chen, Liqiang Wang, En-Jui Lee, Po Chen

University of Wyoming

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, TG ’11, 2011

DOI:10.1145/2016741.2016744

@article{guo2011model,

title={A Model-Driven Partitioning and Auto-tuning Integrated Framework for Sparse Matrix-Vector Multiplication on GPUs},

author={Guo, P. and Huang, H. and Chen, Q. and Wang, L. and Lee, E.J. and Chen, P.},

year={2011}

}

Download (PDF)

View

Source

2295

views

Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of SpMV on GPUs. This paper makes the following contributions: (1) Propose an empirical CUDA performance model to predict the execution time of SpMV CUDA kernels. (2) Design and implement a model-driven partitioning framework to predict how to partition the target sparse matrix into one or more partitions and transform each partition into appropriate storage format, which is based on the fact that the different storage formats of sparse matrix can significantly affect the performance of SpMV. (3) Integrate the model-driven partitioning with our previous auto-tuning framework to automatically adjust CUDA-specific parameters to optimize performance on specific GPUs. Compared to the NVIDIA’s existing implementations, our approach shows a substantial performance improvement. It has 222%, 197%, and 33% performance improvement on the average for CSR vector kernel, ELL kernel and HYB kernel, respectively.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 295, Performance, Software Engineering, Sparse matrix

September 19, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

Your response

Recent source codes

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Device Virtual Machine (DVM)

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Agentic Code Optimization via Compiler-LLM Cooperation

Most viewed papers (last 30 days)

A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)