13046

Dogwild! – Distributed Hogwild for CPU & GPU

Cyprien Noel, Simon Osindero
Flickr Vision & Machine Learning Group, Yahoo! Inc
Distributed Machine Learning and Matrix Computations, NIPS 2014 Workshop, 2014

@article{noel2014dogwild,

   title={Dogwild!-Distributed Hogwild for CPU & GPU},

   author={Noel, Cyprien and Osindero, Simon},

   year={2014}

}

Download Download (PDF)   View View   Source Source   

3534

views

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library [3], allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe’s complexity. We isolate parallelization from Caffe’s existing SGD code, train unmodified models, and run on commodity hardware. Isolation is achieved by extending the Hogwild model, i.e. running parallel SGD solvers without synchronization, by also removing synchronization between solvers and components in charge of streaming gradients between nodes. In this modular design, components interact exclusively through unsynchronized reads and writes to the weight buffer. Each component is free to loop over the weights at a different pace, keeping both compute and network resources fully utilized. SGD’s resiliency against gradient loss allows further performance improvements by avoiding reliable network protocols. It enables the use of multicast messages, and of low level packets streaming through raw sockets or InfiniBand verbs. We show linear performance scaling for small clusters on MNIST, and early results on ImageNet.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: