{"id":27302,"date":"2022-10-02T14:20:41","date_gmt":"2022-10-02T11:20:41","guid":{"rendered":"https:\/\/hgpu.org\/?p=27302"},"modified":"2022-10-02T14:20:41","modified_gmt":"2022-10-02T11:20:41","slug":"an-opencl-based-fpga-accelerator-for-faster-r-cnn","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=27302","title":{"rendered":"An OpenCL-Based FPGA Accelerator for Faster R-CNN"},"content":{"rendered":"<p>In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN&#8217;s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP\/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10\u00d7 and 2.1\u00d7 inference throughput improvements, respectively.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN&#8217;s [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,90,3],"tags":[659,1782,1673,533,377,34,20,1793,176,1852,1543],"class_list":["post-27302","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-computational-complexity","tag-computer-science","tag-deep-learning","tag-design-space-exploration","tag-fpga","tag-neural-networks","tag-nvidia","tag-opencl","tag-package","tag-rnn","tag-tesla-k40"],"views":1473,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/27302","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=27302"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/27302\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=27302"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=27302"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=27302"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}