swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Tsinghua University, National Supercomputing Center in Wuxi
arXiv:1903.06934 [cs.DC], (16 Mar 2019)
@misc{fang2019swcaffe,
title={swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight},
author={Jiarui Fang and Liandeng Li and Haohuan Fu and Jinlei Jiang and Wenlai Zhao and Conghui He and Xin You and Guangwen Yang},
year={2019},
eprint={1903.06934},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
This paper reports our efforts on swCaffe, a highly efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, the current fastest supercomputer in the world that adopts a unique many-core heterogeneous architecture, with 40,960 SW26010 processors connected through a customized communication network. First, we point out some insightful principles to fully exploit the performance of the innovative many-core architecture. Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe. Third, we put forward a topology-aware parameter synchronization scheme to scale the synchronous Stochastic Gradient Descent (SGD) method to multiple processors efficiently. We evaluate our framework by training a variety of widely used neural networks with the ImageNet dataset. On a single node, swCaffe can achieve 23%~119% overall performance compared with Caffe running on K40m GPU. As compared with the Caffe on CPU, swCaffe runs 3.04~7.84x faster on all the networks. Finally, we present the scalability of swCaffe for the training of ResNet-50 and AlexNet on the scale of 1024 nodes.
March 24, 2019 by hgpu