8775

Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

Jilin Zhang, Enyi Liu, Jian Wan, Yongjian Ren, Miao Yue, Jue Wang
Department of Computer and Technology, Hangzhou Dianzi University, 310018, Hangzhou, Zhejiang, China
Applied Mathematics & Information Sciences, Volume 7, p.473-482, 2013
@article{zhang2013implementing,

   title={Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU},

   author={Zhang, J. and Liu, E. and Wan, J. and Ren, Y. and Yue, M. and Wang, J.},

   journal={Appl. Math},

   volume={7},

   number={2},

   pages={473–482},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

511

views

We are going through the computation from single core to multicore architecture in parallel programming. Graphics Processor Units (GPUs) have recently emerged as outstanding platforms for data parallel applications with regular data access patterns. However, it is still challenging to optimize computations with irregular data access patterns like sparse matrix-vector multiplication (SPMV). SPMV is one of the most important computational kernels in engineering practice and scientific computation. Various data formats to store the sparse matrix have been implemented on GPUs to maximize the performance. In this paper, we propose and evaluate a new implementation of SPMV on GPU based on QCSR storage format which combines the quadtree storage format and CSR format. We also outline some optimization strategies to improve performance. In comparison with previously published implementation, it achieves higher overall performance than BCSR format. The results show that it achieves 1.15 speedup averagely than BCSR format.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Like us on Facebook

HGPU group

243 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1467 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: