High-Performance Matrix-Vector Multiplication on the GPU

Hans Henrik Brandenborg Sorensen
Informatics and Mathematical Modelling, Technical University of Denmark, Bldg. 321, DK-2800 Lyngby, Denmark
Springer-Verlag Berlin Heidelberg, pp. 377-386, 2012


   title={High-Performance Matrix-Vector Multiplication on the GPU},

   author={S{o}rensen, H.H.B.},



Download Download (PDF)   View View   Source Source   



In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: