high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » Dogwild! – Distributed Hogwild for CPU & GPU

Dogwild! – Distributed Hogwild for CPU & GPU

Cyprien Noel, Simon Osindero

Flickr Vision & Machine Learning Group, Yahoo! Inc

Distributed Machine Learning and Matrix Computations, NIPS 2014 Workshop, 2014

BibTeX

Download (PDF)

View

Source

3922

views

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library [3], allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe’s complexity. We isolate parallelization from Caffe’s existing SGD code, train unmodified models, and run on commodity hardware. Isolation is achieved by extending the Hogwild model, i.e. running parallel SGD solvers without synchronization, by also removing synchronization between solvers and components in charge of streaming gradients between nodes. In this modular design, components interact exclusively through unsynchronized reads and writes to the weight buffer. Each component is free to loop over the weights at a different pace, keeping both compute and network resources fully utilized. SGD’s resiliency against gradient loss allows further performance improvements by avoiding reliable network protocols. It enables the use of multicast messages, and of low level packets streaming through raw sockets or InfiniBand verbs. We show linear performance scaling for small clusters on MNIST, and early results on ImageNet.

Tags: Computer science, Computer vision, CUDA, Distributed computing, Machine learning, Neural networks, nVidia

November 9, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Dogwild! – Distributed Hogwild for CPU & GPU

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Dogwild! – Distributed Hogwild for CPU & GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)