16447

MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs

Tingxing Dong, Azzam Haidar, Piotr Luszczek, Stanimire Tomov, Ahmad Abdelfattah, Jack Dongarra
Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, 37996
ICL Tech Report, 08/2016, 2016
BibTeX

Download Download (PDF)   View View   Source Source   

2490

views

A particularly challenging class of problems arising in many applications, called batched problems, involves linear algebra operations on many small-sized matrices. We proposed and designed batched BLAS (Basic Linear Algebra Subroutines), Level-2 GEMV and Level-3 GEMM, to solve them. We illustrate how to optimize batched GEMV and GEMM to assist batched advance factorization (e.g. bi-diagonalization) and other BLAS routines (e.g. forward/back substitution) to achieve optimal performance on GPUs. Our solutions achieved up to 2.8-3x speedups compared to CUBLAS and MKL solutions, wherever possible. We applied our batched methodology in a real-world Hydrodynamic application by reformulating the tensor operations into batched BLAS GEMV and GEMM operations. A 2.5x speedup and a 1.4x greenup are obtained by changing 10% of the code. We accelerated and scaled it on Titan supercomputer to 4096 nodes.
Rating: 1.8/5. From 3 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org