High-Performance Matrix-Vector Multiplication on the GPU
Informatics and Mathematical Modelling, Technical University of Denmark, Bldg. 321, DK-2800 Lyngby, Denmark
Springer-Verlag Berlin Heidelberg, pp. 377-386, 2012
@article{sorensen2012high,
title={High-Performance Matrix-Vector Multiplication on the GPU},
author={S{o}rensen, H.H.B.},
year={2012}
}
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
April 18, 2012 by hgpu
Your response
You must be logged in to post a comment.