18807

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

Jiarui Fang, Liandeng Li, Haohuan Fu, Jinlei Jiang, Wenlai Zhao, Conghui He, Xin You, Guangwen Yang
Tsinghua University, National Supercomputing Center in Wuxi
arXiv:1903.06934 [cs.DC], (16 Mar 2019)

@misc{fang2019swcaffe,

   title={swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight},

   author={Jiarui Fang and Liandeng Li and Haohuan Fu and Jinlei Jiang and Wenlai Zhao and Conghui He and Xin You and Guangwen Yang},

   year={2019},

   eprint={1903.06934},

   archivePrefix={arXiv},

   primaryClass={cs.DC}

}

This paper reports our efforts on swCaffe, a highly efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, the current fastest supercomputer in the world that adopts a unique many-core heterogeneous architecture, with 40,960 SW26010 processors connected through a customized communication network. First, we point out some insightful principles to fully exploit the performance of the innovative many-core architecture. Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe. Third, we put forward a topology-aware parameter synchronization scheme to scale the synchronous Stochastic Gradient Descent (SGD) method to multiple processors efficiently. We evaluate our framework by training a variety of widely used neural networks with the ImageNet dataset. On a single node, swCaffe can achieve 23%~119% overall performance compared with Caffe running on K40m GPU. As compared with the Caffe on CPU, swCaffe runs 3.04~7.84x faster on all the networks. Finally, we present the scalability of swCaffe for the training of ResNet-50 and AlexNet on the scale of 1024 nodes.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: