GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths
School of Computer Engineering, Nanyang Technological University, Singapore 639798
International Symposium on Field-Programmable Gate Arrays, 2016
@article{kapre2016gpu,
title={GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths},
author={Kapre, Nachiket and Ye, Deheng},
year={2016}
}
Bitwidth optimization of FPGA datapaths can save hardware resources by choosing the fewest number of bits required for each datapath variable to achieve a desired quality of result. However, it is an NP-hard problem that requires unacceptably long runtimes when using sequential CPU-based heuristics. We show how to parallelize the key steps of bitwidth optimization on the GPU by performing a fast brute-force search over a carefully constrained search space. We develop a high-level synthesis methodology suitable for rapid prototyping of bitwidth-annotated RTL code generation using gcc’s GIMPLE backend. For range analysis, we perform parallel evaluation of sub-intervals to provide tighter bounds compared to ordinary interval arithmetic. For bitwidth allocation, we enumerate the different bitwidth combinations in parallel by assigning each combination to a GPU thread. We demonstrate up to 10-1000x speedups for range analysis and 50-200x speedups for bitwidth allocation when comparing NVIDIA K20 GPU implementation to an Intel Core i5-4570 CPU while maintaining identical solution quality across various benchmarks. This allows us to generate tailor-made RTL with minimum bitwidths in hundreds of milliseconds instead of hundreds of minutes when starting from high-level C descriptions of dataflow computations.
February 10, 2016 by hgpu