Fast convolutional neural networks on FPGAs with hls4ml
European Organization for Nuclear Research (CERN), CH-1211 Geneva 23, Switzerland
arXiv:2101.05108 [cs.LG], (13 Jan 2021)
@misc{aarrestad2021fast,
title={Fast convolutional neural networks on FPGAs with hls4ml},
author={Thea Aarrestad and Vladimir Loncar and Maurizio Pierini and Sioni Summers and Jennifer Ngadiuba and Christoffer Petersson and Hampus Linander and Yutaro Iiyama and Giuseppe Di Guglielmo and Javier Duarte and Philip Harris and Dylan Rankin and Sergo Jindariani and Kevin Pedro and Nhan Tran and Mia Liu and Edward Kreinar and Zhenbin Wu and Duc Hoang},
year={2021},
eprint={2101.05108},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with large convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate how to achieve inference latency of 5μs using convolutional architectures, while preserving state-of-the-art model performance. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be reduced by over 90% while maintaining the original model accuracy.
January 17, 2021 by hgpu