high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

Yap June Wai, Zulkalnain bin Mohd Yussof, Sani Irwan bin Md Salim

Center for Telecommunication Research and Innovation, Faculty of Electronic and Computer Engineering, Universiti Teknikal Malaysia Melaka, Melaka Malaysia

International Journal of Recent Technology and Engineering (IJRTE), Volume 8 Issue 2S6, 2019

DOI:10.35940/ijrte.B1150.0782S619

BibTeX

Download (PDF)

View

Source

3185

views

The trend of increasingly model size in Deep Neural Network (DNN) algorithms boost the performance of visual recognition tasks. These gains in performance have come at a cost of increase in computational complexity and memory bandwidth. Recent studies have explored the fixed-point implementation of DNN algorithms such as AlexNet and VGG on Field Programmable Gate Array (FPGA) to facilitate the potential of deployment on embedded system. However, there are still lacking research on DNN object detection algorithms on FPGA. Consequently, we propose the implementation of Tiny-Yolo-v2 on Cyclone V PCIe FPGA board using the High-Level Synthesis Tool: Intel FPGA Software Development Kit (SDK) for OpenCL. In this work, a systematic approach is proposed to convert the floating point Tiny-Yolo-v2 algorithms into 8-bit fixed-point. Our experiments show that the 8-bit fixed-point Tiny-Yolo-v2 have significantly reduce the hardware consumption with only 0.3% loss in accuracy. Finally, our implementation achieves peak performance of 31.34 Giga Operation per Second (GOPS) and comparable performance density of 0.28GOPs/DSP to prior works under 120MHz working frequency.

Tags: Computer science, Computer vision, Deep learning, DSP, FPGA, Neural networks, OpenCL

January 19, 2020 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

Share this:

Recent source codes

Most viewed papers (last 30 days)