11346

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

Ying Zhang, Saizheng Zhang
Department of Automation, University of Science and Technology of China, Hefei 230027, China
25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’13), 2013

@article{zhang2013optimized,

   title={Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform},

   author={Zhang, Ying and Zhang, Saizheng},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

1706

views

In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NIVDIA’s GPU). Carefully layer-wise designed strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architectures’ propagation processes. In our experiment, these kernels save 70% time on avarage comparing with the kernels in NVIDIAs CUBLAS library (widely used by many other neural network toolkits), and help our parallel deep architecture beat the neural structures using CUBLAS kernels in practical problems.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: