Efficient Interleaved Batch Matrix Solvers for CUDA
School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland
arXiv:1909.04539 [cs.DC], (12 Sep 2019)
@misc{gloster2019efficient,
title={Efficient Interleaved Batch Matrix Solvers for CUDA},
author={Andrew Gloster and Enda Carroll and Miguel Bustamante and Lennon O’Naraigh},
year={2019},
eprint={1909.04539},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
In this paper we present a new methodology for data accesses when solving batches of Tridiagonal and Pentadiagonal matrices that all share the same LHS matrix. By only storing one copy of this matrix there is a significant reduction in storage overheads and the authors show that there is also a performance increase in terms of compute time. These two results combined lead to an overall more efficient implementation over the current state of the art algorithms cuThomasBatch and cuPentBatch, allowing for a greater number of systems to be solved on a single GPU.
September 15, 2019 by hgpu