Dogwild! – Distributed Hogwild for CPU & GPU
Flickr Vision & Machine Learning Group, Yahoo! Inc
Distributed Machine Learning and Matrix Computations, NIPS 2014 Workshop, 2014
@article{noel2014dogwild,
title={Dogwild!-Distributed Hogwild for CPU & GPU},
author={Noel, Cyprien and Osindero, Simon},
year={2014}
}
Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library [3], allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe’s complexity. We isolate parallelization from Caffe’s existing SGD code, train unmodified models, and run on commodity hardware. Isolation is achieved by extending the Hogwild model, i.e. running parallel SGD solvers without synchronization, by also removing synchronization between solvers and components in charge of streaming gradients between nodes. In this modular design, components interact exclusively through unsynchronized reads and writes to the weight buffer. Each component is free to loop over the weights at a different pace, keeping both compute and network resources fully utilized. SGD’s resiliency against gradient loss allows further performance improvements by avoiding reliable network protocols. It enables the use of multicast messages, and of low level packets streaming through raw sockets or InfiniBand verbs. We show linear performance scaling for small clusters on MNIST, and early results on ImageNet.
November 9, 2014 by hgpu