19719

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, Dingwen Tao
Department of Computer Science, The University of Alabama, Tuscaloosa, AL 35487, USA
arXiv:2002.03258 [cs.DC], (12 Feb 2020)

@misc{rivera2020ism2,

   title={ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs},

   author={Cody Rivera and Jieyang Chen and Nan Xiong and Shuaiwen Leon Song and Dingwen Tao},

   year={2020},

   eprint={2002.03258},

   archivePrefix={arXiv},

   primaryClass={cs.DC}

}

Download Download (PDF)   View View   Source Source   

1384

views

Linear algebra operations have been widely used in big data analytics and scientific computations. Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input. However, few works are focusing on fully utilizing GPU resources when the input is not regular-shaped. Current optimizations lack of considering fully utilizing the memory bandwidth and computing power, therefore they could only achieve sub-optimal performance. In this paper, we propose two efficient irregular-shaped matrix-matrix multiplication (GEMM) algorithms on GPUs, called TSM2 and ISM2. Both of them focus on optimizing GEMMs with various input sizes where at least one of the matrices is tall-and-skinny. We implement our proposed algorithms and test on several modern Nvidia GPU micro-architectures. Experiments show that compared to state of the art, our TSM2 speeds up the computation by 1.1x~3x and improves the memory bandwidth utilization and computing power utilization by 8%~47.6% and 7%~37.3%, respectively, when the size of regular matrix is relatively large or medium. Moreover, our ISM2 speeds up the GEMM by 1.1x~3.5x and improve the memory bandwidth utilization by up to 55% when the size of regular matrix is relatively small.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: