high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability

Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability

Janis Keuper

Fraunhofer ITWM

arXiv:1609.06870 [cs.CV], (22 Sep 2016)

@article{keuper2016distributed,

title={Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability},

author={Keuper, Janis},

year={2016},

month={sep},

archivePrefix={"arXiv"},

primaryClass={cs.CV}

}

Download (PDF)

View

Source

1296

views

This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.

Tags: Caffe, Computer science, CUDA, Deep learning, Intel Xeon Phi, Neural networks, nVidia, Tesla K80

September 30, 2016 by hgpu

Rating: 1.8/5. From 5 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Distributed Training of Deep Neuronal Networks: Theoretical and Practical Limits of Parallel Scalability

Share this:

Recent source codes

Most viewed papers (last 30 days)