Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication
Department of Computer Science, University of Wyoming, USA
2012 International Conference on High Performance Computing & Simulation (HPCS 2012), 2012
@article{guo2012accurate,
title={Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication},
author={Guo, P. and Wang, L.},
year={2012}
}
This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our model match the measured execution times of NVIDIA’s SpMV implementations very well. Specifically, for 29 out of 32 test cases, the performance differences are under or around 7%. For the rest 3 test cases, the differences are between 8% and 10%. For CSR, ELL, COO, and HYB SpMV kernels, the differences are 4:2%, 5:2%, 1:0%, and 5:7% on the average, respectively.
May 19, 2012 by hgpu