Reducing Synchronous GPU Memory Transfers: Design and implementation of a Futhark compiler optimisation
University of Copenhagen, Faculty of Science
University of Copenhagen, 2022
@article{borgesen2022reducing,
title={Reducing Synchronous GPU Memory Transfers},
author={B{o}rgesen, Philip Jon},
year={2022}
}
We present a series of dataflow dependent program transformations that reduce memory transfers between a GPU and its host, and show how the problem of minimising memory transfers to the host amounts to finding minimum vertex cuts in a series of data dependency graphs. We provide a specialised algorithm to solve these minimisation problems, based on the Ford-Fulkerson max-flow algorithm, and detail techniques to model conditional execution and loops in a pure functional programming language. We present our work in context of the array programming language Futhark, in whose compiler we have implemented our techniques. Empirical evaluation of 27 benchmark programs on four GPUs show mean speedups of 117–158%, heavily skewed by significant improvements to a few programs.
July 17, 2022 by hgpu