high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

Ping Guo, Liqiang Wang

Department of Computer Science, University of Wyoming, USA

2012 International Conference on High Performance Computing & Simulation (HPCS 2012), 2012

BibTeX

Download (PDF)

View

Source

1781

views

This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our model match the measured execution times of NVIDIA’s SpMV implementations very well. Specifically, for 29 out of 32 test cases, the performance differences are under or around 7%. For the rest 3 test cases, the differences are between 8% and 10%. For CSR, ELL, COO, and HYB SpMV kernels, the differences are 4:2%, 5:2%, 1:0%, and 5:7% on the average, respectively.

Tags: Computer science, CUDA, nVidia, Sparse matrix, Tesla C2050

May 19, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

Your response

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)