Compositional Compilation for Sparse, Irregular Data Parallelism

hgpu.org » Applications » Computer science » Compositional Compilation for Sparse, Irregular Data Parallelism

Compositional Compilation for Sparse, Irregular Data Parallelism

Adam Harries, Michel Steuwer, Murray Cole, Alan Gray, Christophe Dubach

University of Edinburgh

Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU), 2016

@article{harries2016compositional,

title={Compositional Compilation for Sparse, Irregular Data Parallelism},

author={Harries, Adam and Steuwer, Michel and Cole, Murray and Gray, Alan and Dubach, Christophe},

year={2016}

}

Download (PDF)

View

Source

1640

views

While contemporary GPU architectures are heavily biased towards the execution of predictably regular data parallelism, many real application domains are based around data structures which are naturally sparse and irregular. In this paper we demonstrate that high level programming and high performance GPU execution for sparse, irregular problems are not mutually exclusive. Our insight is that this can be achieved by capturing sparsity and irregularity friendly implementations within the target space of a pattern-oriented, high-level compilation and transformation system. By working in a language rather than a library, we benefit from the ability to generate implementations by program-specific composition of building blocks which capture detailed, low-level implementation choices. Using sparse matrix-vector multiplication as a case study, we show that the resulting system produces implementations for which the performance is competitive with, and sometimes outperforms that obtained with leading ad-hoc approaches. We show that there are correlations between good implementation choices and simple measurable properties of the irregularity present in problem instances. These can be used to design heuristics which navigate the implementation space effectively. In a case study, we implement a number of versions of sparse matrix-vector multiplication, and achieve promising preliminary performance results. On very regular sparse matrices we are able to achieve up to 1.8x the performance of the state-of-the-art sparse matrix-vector implementation from the clSPARSE libray, and up to 0.7x the performance on very irregular applications.

Tags: Computer science, Data parallelism, nVidia, OpenCL, Sparse matrix, Tesla K40

January 26, 2016 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org