Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA
Ecole polytechnique de Louvain, Universite catholique de Louvain
Ecole polytechnique de Louvain, Universite catholique de Louvain, 2019
@article{joos2019accelerating,
title={Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA},
author={Joos de ter Beerst, Victor and Vanderschueren, Antoine and De Vleeschouwer, Christophe and Legat, Jean-Didier},
year={2019}
}
FPGAs balance the reprogammability of CPUs and the performance of ASICs. They seem the perfect solution to increase the throughput of neural networks. However they must also prove to be highly competitive in a market dominated by GPUs. To achieve this, we focus on the strength of FPGAs that cannot be taken advantage of on GPUs. Extreme quantization and sparsity are both not suited for acceleration on GPU. We use a quantized version of ResNet assembled from uniform building-blocks in order to achieve better area utilization on FPGA. We manage approximately 2.5% accuracy loss when compared to a floating-point model, while introducing highly quantized weights and activations. Our final network uses ternary weights and 4-bit activations, in addition to shift-based batch normalization. With the use of OpenCL for high-level synthesis, we implement loop tiling, loop unrolling and loop interchange to speed up the convolution operation of our streamlined model on the Cyclone V FPGA. Using ternary weights, we are able to remove multiplications and replace them with simple branching, effectively getting rid of the need for DSPs. Our final FPGA implementation achieves a latency of 17ms per image using a quantized residual network.
March 24, 2019 by hgpu