18403

Parallax: Automatic Data-Parallel Training of Deep Neural Networks

Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun
Seoul National University
arXiv:1808.02621 [cs.DC], (8 Aug 2018)

@article{kim2018parallax,

   title={Parallax: Automatic Data-Parallel Training of Deep Neural Networks},

   author={Kim, Soojeong and Yu, Gyeong-In and Park, Hojin and Cho, Sungwoo and Jeong, Eunji and Ha, Hyeonmin and Lee, Sanha and Jeong, Joo Seong and Chun, Byung-Gon},

   year={2018},

   month={aug},

   archivePrefix={"arXiv"},

   primaryClass={cs.DC}

}

The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in machine learning (ML). ML frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist ML researchers to train their models in a distributed fashion. However, correctly and efficiently utilizing multiple machines and GPUs is still not a straightforward task for framework users due to the non-trivial correctness and performance challenges that arise in the distribution process. This paper introduces Parallax, a tool for automatic parallelization of deep learning training in distributed environments. Parallax not only handles the subtle correctness issues, but also leverages various optimizations to minimize the communication overhead caused by scaling out. Experiments show that Parallax built atop TensorFlow achieves scalable training throughput on multiple CNN and RNN models, while requiring little effort from its users.
Rating: 2.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: