Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL
Center for Telecommunication Research and Innovation, Faculty of Electronic and Computer Engineering, Universiti Teknikal Malaysia Melaka, Melaka Malaysia
International Journal of Recent Technology and Engineering (IJRTE), Volume 8 Issue 2S6, 2019
@article{wai2019hardware,
title={Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL},
author={Wai, Yap June and bin Mohd Yussof, Zulkalnain and bin Md Salim, Sani Irwan},
year={2019}
}
The trend of increasingly model size in Deep Neural Network (DNN) algorithms boost the performance of visual recognition tasks. These gains in performance have come at a cost of increase in computational complexity and memory bandwidth. Recent studies have explored the fixed-point implementation of DNN algorithms such as AlexNet and VGG on Field Programmable Gate Array (FPGA) to facilitate the potential of deployment on embedded system. However, there are still lacking research on DNN object detection algorithms on FPGA. Consequently, we propose the implementation of Tiny-Yolo-v2 on Cyclone V PCIe FPGA board using the High-Level Synthesis Tool: Intel FPGA Software Development Kit (SDK) for OpenCL. In this work, a systematic approach is proposed to convert the floating point Tiny-Yolo-v2 algorithms into 8-bit fixed-point. Our experiments show that the 8-bit fixed-point Tiny-Yolo-v2 have significantly reduce the hardware consumption with only 0.3% loss in accuracy. Finally, our implementation achieves peak performance of 31.34 Giga Operation per Second (GOPS) and comparable performance density of 0.28GOPs/DSP to prior works under 120MHz working frequency.
January 19, 2020 by hgpu