Transformations of High-Level Synthesis Codes for High-Performance Computing
ETH Zurich, Switzerland
arXiv:1805.08288 [cs.DC], (23 May 2018)
@article{licht2018transformations,
title={Transformations of High-Level Synthesis Codes for High-Performance Computing},
author={Licht, Johannes de Fine and Meierhans, Simon and Hoefler, Torsten},
year={2018},
month={may},
archivePrefix={"arXiv"},
primaryClass={cs.DC}
}
Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from software design are no longer sufficient to implement high-performance codes, due to fundamental differences between software and hardware architectures. In this work, we propose a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures with little off-chip data movement. To quantify the effect of our transformations, we use them to optimize a set of high-throughput FPGA kernels, demonstrating that they are sufficient to scale up parallelism within the hardware constraints of the target device. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS.
May 26, 2018 by hgpu