MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs
Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, 37996
ICL Tech Report, 08/2016, 2016
@techreport{ICL970,
title={MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs},
journal={ICL Tech Report},
year={2016},
month={08/2016},
keywords={Batched, Bi-diagonalization, gpu, Hydrodynamic},
author={Tingxing Dong and Azzam Haidar and Piotr Luszczek and Stanimire Tomov and Ahmad Abdelfattah and Jack Dongarra}
}
A particularly challenging class of problems arising in many applications, called batched problems, involves linear algebra operations on many small-sized matrices. We proposed and designed batched BLAS (Basic Linear Algebra Subroutines), Level-2 GEMV and Level-3 GEMM, to solve them. We illustrate how to optimize batched GEMV and GEMM to assist batched advance factorization (e.g. bi-diagonalization) and other BLAS routines (e.g. forward/back substitution) to achieve optimal performance on GPUs. Our solutions achieved up to 2.8-3x speedups compared to CUBLAS and MKL solutions, wherever possible. We applied our batched methodology in a real-world Hydrodynamic application by reformulating the tensor operations into batched BLAS GEMV and GEMM operations. A 2.5x speedup and a 1.4x greenup are obtained by changing 10% of the code. We accelerated and scaled it on Titan supercomputer to 4096 nodes.
August 23, 2016 by hgpu