GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training
University of Illinois at Urbana-Champaign, Urbana, IL
arXiv:1312.6186 [cs.CV], (21 Dec 2013)
@article{2013arXiv1312.6186P,
author={Paine}, T. and {Jin}, H. and {Yang}, J. and {Lin}, Z. and {Huang}, T.},
title={"{GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1312.6186},
primaryClass={"cs.CV"},
keywords={Computer Science – Computer Vision and Pattern Recognition, Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Learning, Computer Science – Neural and Evolutionary Computing},
year={2013},
month={dec},
adsurl={http://adsabs.harvard.edu/abs/2013arXiv1312.6186P},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been used mostly in industry. We report early experiments with a system that makes use of both model parallelism and data parallelism, we call GPU A-SGD. We show using GPU A-SGD it is possible to speed up training of large convolutional neural networks useful for computer vision. We believe GPU A-SGD will make it possible to train larger networks on larger training sets in a reasonable amount of time.
December 24, 2013 by hgpu