Scaling Deep Learning on Multiple In-Memory Processors
AMD Research, Advanced Micro Devices, Inc.
3rd Workshop on Near-Data Processing In conjunction with MICRO-48, 2015
@article{xu2015scaling,
title={Scaling Deep Learning on Multiple In-Memory Processors},
author={Xu, Lifan and Zhang, Dong Ping and Jayasena, Nuwan},
year={2015}
}
Deep learning methods are proven to be state-of-theart in addressing many challenges in machine learning domains. However, it comes at the cost of high computational requirements and energy consumption. The emergence of Processing In Memory (PIM) with diestacking technology presents an opportunity to speed up deep learning computation and reduce energy consumption by providing low-cost high-bandwidth memory accesses. PIM uses 3D die stacking to move computations closer to memory and therefore reduce data movement overheads. In this paper, we study the parallelization of deep learning methods on a system with multiple PIM devices. We select three typical layers: the convolutional, pooling, and fully connected layers from common deep learning models and parallelize them using different schemes. Preliminary results show we are able to reach competitive or even better performance using multiple PIM devices when comparing with traditional GPU parallelization.
December 8, 2015 by hgpu