28797

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

Benjamin Brock, Aydın Buluç, Katherine Yelick
EECS Department, University of California, Berkeley, CA
arXiv:2311.18141 [cs.DC], (29 Nov 2023)

@misc{brock2023rdmabased,

   title={RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs},

   author={Benjamin Brock and Aydın Buluç and Katherine Yelick},

   year={2023},

   eprint={2311.18141},

   archivePrefix={arXiv},

   primaryClass={cs.DC}

}

Download Download (PDF)   View View   Source Source   

743

views

Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse (SpGEMM) algorithms, evaluating their performance running in a distributed memory setting on GPUs. Our RDMA-based implementations use the NVSHMEM communication library for direct, asynchronous one-sided communication between GPUs. We compare our asynchronous implementations to state-of-the-art bulk synchronous GPU libraries as well as a CUDA-aware MPI implementation of the SUMMA algorithm. We find that asynchronous RDMA-based implementations are able to offer favorable performance compared to bulk synchronous implementations, while also allowing for the straightforward implementation of novel work stealing algorithms.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: