Fast convolutional neural networks on FPGAs with hls4ml
European Organization for Nuclear Research (CERN), CH-1211 Geneva 23, Switzerland
arXiv:2101.05108 [cs.LG], (13 Jan 2021)
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with large convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate how to achieve inference latency of 5μs using convolutional architectures, while preserving state-of-the-art model performance. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be reduced by over 90% while maintaining the original model accuracy.
January 17, 2021 by hgpu