Deep Tensor Convolution on Multicores
Massachusetts Institute of Technology
arXiv:1611.06565 [cs.CV], (20 Nov 2016)
@article{budden2016deep,
title={Deep Tensor Convolution on Multicores},
author={Budden, David and Matveev, Alexander and Santurkar, Shibani and Chaudhuri, Shraman Ray and Shavit, Nir},
year={2016},
month={nov},
archivePrefix={"arXiv"},
primaryClass={cs.CV}
}
Deep convolutional neural networks (ConvNets) have become a de facto standard for image classification and segmentation problems. These networks have also had early success in the video domain, despite failing to capture motion continuity and other rich temporal correlations. Evidence has since emerged that extending ConvNets to 3-dimensions leads to state-of-the-art performance across a broad set of video processing tasks by learning these joint spatiotemporal features. However, these early 3D networks have been restricted to shallower architectures of fewer channels than successful 2D networks due to memory constraints inherent to GPU implementations. In this study we present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels. Our implementation supports arbitrarily deep ConvNets of $N$-dimensional tensors due to the relaxed memory constraints of CPU systems, which can be further leveraged for an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D kernels). Because most of the optimized ConvNets in previous literature are 2 rather than 3-dimensional, we benchmark our performance against the most popular 2D implementations. Even in this special case, which is theoretically the least beneficial for our fast algorithm, we observe a 5 to 25-fold improvement in throughput compared to previous state-of-the-art. We believe this work is an important step toward practical ConvNets for real-time applications, such as mobile video processing and biomedical image analysis, where high performance 3D networks are a must.
November 23, 2016 by hgpu