Compositional Compilation for Sparse, Irregular Data Parallelism

Adam Harries, Michel Steuwer, Murray Cole, Alan Gray, Christophe Dubach
University of Edinburgh
Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU), 2016


   title={Compositional Compilation for Sparse, Irregular Data Parallelism},

   author={Harries, Adam and Steuwer, Michel and Cole, Murray and Gray, Alan and Dubach, Christophe},



Download Download (PDF)   View View   Source Source   



While contemporary GPU architectures are heavily biased towards the execution of predictably regular data parallelism, many real application domains are based around data structures which are naturally sparse and irregular. In this paper we demonstrate that high level programming and high performance GPU execution for sparse, irregular problems are not mutually exclusive. Our insight is that this can be achieved by capturing sparsity and irregularity friendly implementations within the target space of a pattern-oriented, high-level compilation and transformation system. By working in a language rather than a library, we benefit from the ability to generate implementations by program-specific composition of building blocks which capture detailed, low-level implementation choices. Using sparse matrix-vector multiplication as a case study, we show that the resulting system produces implementations for which the performance is competitive with, and sometimes outperforms that obtained with leading ad-hoc approaches. We show that there are correlations between good implementation choices and simple measurable properties of the irregularity present in problem instances. These can be used to design heuristics which navigate the implementation space effectively. In a case study, we implement a number of versions of sparse matrix-vector multiplication, and achieve promising preliminary performance results. On very regular sparse matrices we are able to achieve up to 1.8x the performance of the state-of-the-art sparse matrix-vector implementation from the clSPARSE libray, and up to 0.7x the performance on very irregular applications.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: