An OpenCL-Based FPGA Accelerator for Faster R-CNN

hgpu.org » Applications » Computer science » An OpenCL-Based FPGA Accelerator for Faster R-CNN

An OpenCL-Based FPGA Accelerator for Faster R-CNN

Jianjing An, Dezheng Zhang, Ke Xu, Dong Wang

Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China

Entropy, Volume 24, Issue 10, 2022

DOI:10.3390/e24101346

BibTeX

Download (PDF)

View

Source

Source codes

Package:

PipeCNN: An OpenCL-based FPGA Accelerator for Convolutional Neural Networks

1115

views

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10× and 2.1× inference throughput improvements, respectively.

Tags: Computational Complexity, Computer science, Deep learning, Design space exploration, FPGA, Neural networks, nVidia, OpenCL, Package, RNN, Tesla K40

October 2, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org