27298

Exploiting dynamic sparse matrices for performance portable linear algebra operations

Chris Stylianou, Michele Weiland
EPCC, The University of Edinburgh, Edinburgh, UK
arXiv:2209.06478 [cs.DC], (14 Sep 2022)

@misc{https://doi.org/10.48550/arxiv.2209.06478,

   doi={10.48550/ARXIV.2209.06478},

   url={https://arxiv.org/abs/2209.06478},

   author={Stylianou, Chris and Weiland, Michele},

   keywords={Distributed, Parallel, and Cluster Computing (cs.DC), FOS: Computer and information sciences, FOS: Computer and information sciences},

   title={Exploiting dynamic sparse matrices for performance portable linear algebra operations},

   publisher={arXiv},

   year={2022},

   copyright={Creative Commons Attribution 4.0 International}

}

Sparse matrices and linear algebra are at the heart of scientific simulations. More than 70 sparse matrix storage formats have been developed over the years, targeting a wide range of hardware architectures and matrix types. Each format is developed to exploit the particular strengths of an architecture, or the specific sparsity patterns of matrices, and the choice of the right format can be crucial in order to achieve optimal performance. The adoption of dynamic sparse matrices that can change the underlying data-structure to match the computation at runtime without introducing prohibitive overheads has the potential of optimizing performance through dynamic format selection. In this paper, we introduce Morpheus, a library that provides an efficient abstraction for dynamic sparse matrices. The adoption of dynamic matrices aims to improve the productivity of developers and end-users who do not need to know and understand the implementation specifics of the different formats available, but still want to take advantage of the optimization opportunity to improve the performance of their applications. We demonstrate that by porting HPCG to use Morpheus, and without further code changes, 1) HPCG can now target heterogeneous environments and 2) the performance of the SpMV kernel is improved up to 2.5x and 7x on CPUs and GPUs respectively, through runtime selection of the best format on each MPI process.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: