Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform
Department of Automation, University of Science and Technology of China, Hefei 230027, China
25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’13), 2013
@article{zhang2013optimized,
title={Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform},
author={Zhang, Ying and Zhang, Saizheng},
year={2013}
}
In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NIVDIA’s GPU). Carefully layer-wise designed strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architectures’ propagation processes. In our experiment, these kernels save 70% time on avarage comparing with the kernels in NVIDIAs CUBLAS library (widely used by many other neural network toolkits), and help our parallel deep architecture beat the neural structures using CUBLAS kernels in practical problems.
February 2, 2014 by hgpu