22409

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

Zijing Gu
arXiv:2007.13055 [cs.MS], (26 Jul 2020)

@misc{gu2020optimizing,

   title={Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM},

   author={Zijing Gu},

   year={2020},

   eprint={2007.13055},

   archivePrefix={arXiv},

   primaryClass={cs.MS}

}

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: