Position-Dependent Arrays and Their Application for High Performance Code Generation
The University of Edinburgh, Edinburgh, Scotland, United Kingdom
Functional High-Performance and Numerical Computing (FHPNC), 2019
@article{pizzuti2019position,
title={Position-Dependent Arrays and Their Application for High Performance Code Generation},
author={Pizzuti, Federico and Steuwer, Michel and Dubach, Christophe},
year={2019}
}
Modern parallel hardware promises unprecedented performance, for the gifted few experts who can program it correctly. Code generators from high-level languages provide an attractive alternative, promising to deliver high performance automatically. Existing projects such as Accelerate, Futhark, Halide, or Lift show that this approach is feasible. Unfortunately, existing efforts focus on computations over tensors: regularly shaped higher dimensional arrays. This limits the expressiveness of these approaches and excludes many interesting data structures that are commonly encoded manually in memory, such as trees or triangular matrices. This paper presents an extended array type that lifts this restriction. For multidimensional arrays, the size of a nested array might depend on its position in the surrounding arrays, which enables the expression of computations over less regularly shaped data structures. However, these positiondependent arrays bring new challenges for high-performance code generation, as determining the position of the elements in memory becomes more challenging. This paper shows how these challenges are addressed by extending the existing Lift type system and compiler. The experimental results show that this approach enables the efficient code generation of triangular matrix-vector multiplication, with performance improvements over cuBLAS on an Nvidia GPU by up to 2x. Furthermore, we show a use case for a low-level optimization for avoiding unnecessary out-of-bound checks in stencils, leading to up to 3x improvements over already optimized generated stencil codes.
August 25, 2019 by hgpu