high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High-Performance Matrix-Vector Multiplication on the GPU

High-Performance Matrix-Vector Multiplication on the GPU

Hans Henrik Brandenborg Sorensen

Informatics and Mathematical Modelling, Technical University of Denmark, Bldg. 321, DK-2800 Lyngby, Denmark

Springer-Verlag Berlin Heidelberg, pp. 377-386, 2012

BibTeX

Download (PDF)

View

Source

2256

views

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

Tags: BLAS, Computer science, CUBLAS, CUDA, Linear Algebra, nVidia, Tesla C2050

April 18, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

High-Performance Matrix-Vector Multiplication on the GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

High-Performance Matrix-Vector Multiplication on the GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)