high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Mina Ashoury, Mohammad Loni, Farshad Khunjush, Masoud Daneshtalab

Department of Computer Science and Engineering, Shiraz University, Shiraz, Iran

arXiv:2302.05662 [cs.DC], (11 Feb 2023)

DOI:10.48550/arXiv.2302.05662

@misc{https://doi.org/10.48550/arxiv.2302.05662,

doi={10.48550/ARXIV.2302.05662},

url={https://arxiv.org/abs/2302.05662},

author={Ashoury, Mina and Loni, Mohammad and Khunjush, Farshad and Daneshtalab, Masoud},

keywords={Distributed, Parallel, and Cluster Computing (cs.DC), FOS: Computer and information sciences, FOS: Computer and information sciences},

title={Auto-SpMV: Automated Optimizing SpMV Kernels on GPU},

publisher={arXiv},

year={2023},

}

Download (PDF)

View

Source

1386

views

Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to accelerate SpMV kernels. Prior studies mainly focused on reducing the latency of SpMV kernels on GPU. However, few attempts have been made to improve the energy efficiency of SpMV kernels, resulting in GPUs being excluded from the range of low-power applications. Furthermore, prior work has primarily focused on optimizing the sparse format of SpMV kernels, the literature ignores evaluating the impact of tweaking compilation parameters. Lastly, Little attention has been paid to preparing a comprehensive training dataset of running SpMV kernels and fine-tuning the learning hyperparameters. To address these limitations, we present a novel framework, dubbed Auto-SpMV, that enables energy-efficient and low-latency SpMV kernels on GPU. To achieve the best run time performance, Auto-SpMV proposes two optimization modes: compile-time and run-time. In the compile-time mode, Auto-SpMV tweaks the compilation parameters, while in the run-time mode, Auto-SpMV selects the best sparse format for the sparse input matrix. To achieve the best classification results, 1) we collect the largest dataset ever having 30 different sparse matrices running with more than 15K different configurations, and 2) we boost classification models by automatically fine-tuning the learning hyperparameters. Experimental results reveal that Auto-SpMV optimizes latency, energy consumption, average power, and energy efficiency in the compile-time mode by up to 51.9%, 52%, 33.2%, and 53%, respectively, compared to the default setting. Auto-SpMV optimizes average power and energy efficiency in the run-time mode by up to 34.6% and 99.7%, respectively, compared to the default setting.

Tags: Computer science, CUDA, Energy efficiency, Linear Algebra, nVidia, nVidia GeForce GTX 1080, nVidia GeForce GTX 1650, Sparse matrix

February 26, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)