high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA

Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA

Victor Joos de ter Beerst, Antoine Vanderschueren

Ecole polytechnique de Louvain, Universite catholique de Louvain

Ecole polytechnique de Louvain, Universite catholique de Louvain, 2019

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA

2356

views

FPGAs balance the reprogammability of CPUs and the performance of ASICs. They seem the perfect solution to increase the throughput of neural networks. However they must also prove to be highly competitive in a market dominated by GPUs. To achieve this, we focus on the strength of FPGAs that cannot be taken advantage of on GPUs. Extreme quantization and sparsity are both not suited for acceleration on GPU. We use a quantized version of ResNet assembled from uniform building-blocks in order to achieve better area utilization on FPGA. We manage approximately 2.5% accuracy loss when compared to a floating-point model, while introducing highly quantized weights and activations. Our final network uses ternary weights and 4-bit activations, in addition to shift-based batch normalization. With the use of OpenCL for high-level synthesis, we implement loop tiling, loop unrolling and loop interchange to speed up the convolution operation of our streamlined model on the Cyclone V FPGA. Using ternary weights, we are able to remove multiplications and replace them with simple branching, effectively getting rid of the need for DSPs. Our final FPGA implementation achieves a latency of 17ms per image using a quantized residual network.

Tags: Computer science, DSP, FPGA, Neural networks, OpenCL, Package, Thesis

March 24, 2019 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)