RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
EECS Department, University of California, Berkeley, CA
arXiv:2311.18141 [cs.DC], (29 Nov 2023)
@misc{brock2023rdmabased,
title={RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs},
author={Benjamin Brock and Aydın Buluç and Katherine Yelick},
year={2023},
eprint={2311.18141},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse (SpGEMM) algorithms, evaluating their performance running in a distributed memory setting on GPUs. Our RDMA-based implementations use the NVSHMEM communication library for direct, asynchronous one-sided communication between GPUs. We compare our asynchronous implementations to state-of-the-art bulk synchronous GPU libraries as well as a CUDA-aware MPI implementation of the SUMMA algorithm. We find that asynchronous RDMA-based implementations are able to offer favorable performance compared to bulk synchronous implementations, while also allowing for the straightforward implementation of novel work stealing algorithms.
December 3, 2023 by hgpu