On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

hgpu.org » Applications » Computer science » On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

Ulysse Beaugnon, Basile Clément, Nicolas Tollenaere, Albert Cohen

Ecole Normale Superieure

arXiv:1904.03383 [cs.PL], (6 Apr 2019)

@article{beaugnon2019representation,

title={On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU},

author={Beaugnon, Ulysse and Clément, Basile and Tollenaere, Nicolas and Cohen, Albert},

year={2019},

eprint={1904.03383},

archivePrefix={arXiv},

primaryClass={cs.PL}

}

Download (PDF)

View

Source

2153

views

Traditional optimizing compilers rely on rewrite rules to iteratively apply program transformations. This iterative approach hides optimization opportunities behind intermediate transformation steps. For instance, vectorization can only be applied to the innermost loop in a nest: one must first perform a loop interchange before even considering vectorization of an outer loop. In contrast, we propose an implementation framework representing programs as sets of possible implementation decisions. Specifying one decision can have an impact on others in a bidirectional manner: specifying that a loop must be vectorized prevents other loops from being nested inside it; conversely, specifying a loop as an outer loop will prevent it from being vectorized. These optimization decisions commute, obviating the pass ordering problem. We present a constraint programming system to formally define, represent and explore such implementation spaces. We also propose an exploration strategy combining tree search and branch-and-bound; the strength and novelty of this strategy reside in an analytical model of the lower bound on the execution time of a set of possible implementations. We showcase our approach on the construction and exploration of an implementation space for linear algebra kernels running on GPUs. We show this search space is expressive enough to represent complex decisions that fundamentally change the structure of the generated code. We also present preliminary results competitive with the performance of native GPU libraries.

Tags: Analytical model, Computer science, CUDA, Linear Algebra, nVidia, nVidia Quadro K4000, Programming Languages

April 14, 2019 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org