Lina: a fast design optimisation tool for software-based FPGA programming
Catálogo USP
Instituto de Ciências Matemáticas e de Computação, São Carlos, 2022
@phdthesis{perina2022lina,
title={Lina: a fast design optimisation tool for software-based FPGA programming},
author={Perina, Andre Bannwart},
school={Universidade de S{~a}o Paulo},
year={2022}
}
The continuous technology push on the semiconductor industry has led to the development of several alternate architectures for efficient computing. Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) are examples of devices used to accelerate applications. FPGAs are able to provide massive parallelism for suitable tasks when properly programmed. However, designing for FPGA is non-trivial and requires specific knowledge that deviates from the usual software programming. As an alternative towards increasing programmability, High-Level Synthesis (HLS) tools allow high-level languages such as C/C++/OpenCL to be used as input for FPGA design. However, early experiments and other studies in the literature demonstrate that significant code modification is still necessary so that the results are minimally acceptable. This aspect mitigates the democratisation and simplification that HLS tools seek to achieve. The major contribution of this thesis works on the C/C++ level, composed of a design space exploration tool that uses an estimator named Lina. Based on Lin-analyzer, Lina uses a traced execution of a software code to approximate the compilation behaviour of Vivado HLS, a C/C++ HLS compiler for Xilinx FPGAs. For a given C/C++ kernel, Lina provides a fast approximation of metrics such as execution time and FPGA resources occupied. Along with HLS compiler optimisation directives that Lina supports in its estimation, our exploration method allows the optimisation of not only execution time, but also FPGA resource usage. We then used Lina to optimise 16 C/C++ kernels from the PolyBench benchmark, and the estimated optimal solutions were among the 1% best options. An average of 14-16× performance speedup was achieved, accounting for 70% of the reachable speedup when considering the traversed design spaces. Additionally, Lina allows the exploration of off-chip memory transactions in search of optimisations such as coalescing, data packing, or to inform about potential HLS compiler limitations that could degrade performance.
September 4, 2022 by hgpu