Design Principles for Sparse Matrix Multiplication on the GPU

Carl Yang, Aydin Buluc, John D. Owens
University of California, Davis CA 95616, USA
arXiv:1803.08601 [cs.DC], (22 Mar 2018)


   title={Design Principles for Sparse Matrix Multiplication on the GPU},

   author={Yang, Carl and Buluc, Aydin and Owens, John D.},






Download Download (PDF)   View View   Source Source   



We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients – (i) merge-based load-balancing and (ii) row-major coalesced memory access – we demonstrate a 3.6x peak speedup and a 23.5% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
Rating: 3.7/5. From 3 votes.
Please wait...

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: