7461

High-Performance Matrix-Vector Multiplication on the GPU

Hans Henrik Brandenborg Sorensen
Informatics and Mathematical Modelling, Technical University of Denmark, Bldg. 321, DK-2800 Lyngby, Denmark
Springer-Verlag Berlin Heidelberg, pp. 377-386, 2012

@article{sorensen2012high,

   title={High-Performance Matrix-Vector Multiplication on the GPU},

   author={S{o}rensen, H.H.B.},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

2272

views

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: