Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto
arXiv:1912.12675 [cs.DC], (29 Dec 2019)
@misc{zhang2019pipelined,
title={Pipelined Training with Stale Weights of Deep Convolutional Neural Networks},
author={Lifu Zhang and Tarek S. Abdelrahman},
year={2019},
eprint={1912.12675},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
The growth in the complexity of Convolutional Neural Networks (CNNs) is increasing interest in partitioning a network across multiple accelerators during training and pipelining the backpropagation computations over the accelerators. Existing approaches avoid or limit the use of stale weights through techniques such as micro-batching or weight stashing. These techniques either underutilize of accelerators or increase memory footprint. We explore the impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme that maximizes accelerator utilization and keeps memory overhead modest. We use 4 CNNs (LeNet-5, AlexNet, VGG and ResNet) and show that when pipelining is limited to early layers in a network, training with stale weights converges and results in models with comparable inference accuracies to those resulting from non-pipelined training on MNIST and CIFAR-10 datasets; a drop in accuracy of 0.4%, 4%, 0.83% and 1.45% for the 4 networks, respectively. However, when pipelining is deeper in the network, inference accuracies drop significantly. We propose combining pipelined and non-pipelined training in a hybrid scheme to address this drop. We demonstrate the implementation and performance of our pipelined backpropagation in PyTorch on 2 GPUs using ResNet, achieving speedups of up to 1.8X over a 1-GPU baseline, with a small drop in inference accuracy.
January 5, 2020 by hgpu