Survey paper on Deep Learning on GPUs
author={Mittal, Sparsh and Vaishay, Shraiysh},
title={Survey paper on Deep Learning on GPUs},
year={2019}
}
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU and distributed system with multiple GPUs. It covers techniques for pruning, tiling, batching, impact of data-layouts, data-reuse schemes and convolution strategies (FFT/direct/GEMM/Winograd), etc. It also covers techniques for offloading data to CPU memory for avoiding GPU-memory bottlenecks during training.
The paper is available here, accepted in J. of Systems Architecture 2019.