17395

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Francesc Sastre Cabot
Universitat Politecnica de Catalunya (UPC) BarcelonaTech
Universitat Politecnica de Catalunya, 2017

@mastersthesis{sastre2017scalability,

   title={Scalability study of Deep Learning algorithms in high performance computer infrastructures},

   author={Sastre Cabot, Francesc},

   year={2017},

   school={Universitat Polit{`e}cnica de Catalunya}

}

Download Download (PDF)   View View   Source Source   

893

views

Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. This project show how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster, Minotauro GPU cluster from Barcelona Supercomputing Center with the TensorFlow framework. In this project, two approaches for distributed training are used, the synchronous and the mixed-asynchronous. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied. The results show an improvement for both focused areas. On one hand, the experiments show promising results in order to train a neural network faster. The training time is decreased from 106 hours to 16 hours in mixedasynchronous and 12 hours in synchronous. On the other hand we can observe how increasing the numbers of GPUs in one node rises the throughput, images per second, in a near-linear way. Moreover the accuracy can be maintained, like the one node training, in the synchronous methods.
Rating: 1.5/5. From 2 votes.
Please wait...

* * *

* * *

Featured events

2018
November
27-30
Hida Takayama, Japan

The Third International Workshop on GPU Computing and AI (GCA), 2018

2018
September
19-21
Nagoya University, Japan

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

2018
September
22-24
MediaCityUK, Salford Quays, Greater Manchester, England

The 10th International Conference on Information Management and Engineering (ICIME), 2018

2018
August
21-23
No. 1037, Luoyu Road, Hongshan District, Wuhan, China

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

2018
October
29-31
Nanyang Executive Centre in Nanyang Technological University, Singapore

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: