Optimising Convolutional Neural Networks Inference on Low-Powered GPUs
School of Informatics, University of Edinburgh, UK
12th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG), 2019
@article{rovder2019optimising,
title={Optimising Convolutional Neural Networks Inference on Low-Powered GPUs},
author={Rovder, Simon and Cano, Jos{‘e} and O’Boyle, Michael},
year={2019}
}
In this paper we present effective optimisation techniques for accelerating convolutional neural networks inference on low-powered heterogeneous devices with OpenCL. Using LeNet and VGG-16 as test networks, we implement a custom neural network system in OpenCL and optimise it to minimise their inference times. Our baseline system shows a speedup of 17x for LeNet. We also outline two methods for fast convolution: an iterative vectorised approach and a Morton GEMM based approach. The two approaches demonstrate VGG-16 inference speeds up to 3x faster than current state-of-the-art systems and outperform other custom neural network systems by speedup factors of up to 1.82x.
February 10, 2019 by hgpu