16804

GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform

Syed Tahir Hussain Rizvi, Gianpiero Cabodi, Denis Patti, Gianluca Francini
Dipartimento di Automatica e Informatica (DAUIN), Politecnico di Torino, Turin 10129, Italy
Electronics, 5(4), 88, 2016

@article{electronics5040088,

   author={Rizvi, Syed Tahir Hussain and Cabodi, Gianpiero and Patti, Denis and Francini, Gianluca},

   title={GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform},

   journal={Electronics},

   volume={5},

   year={2016},

   number={4},

   pages={88},

   url={http://www.mdpi.com/2079-9292/5/4/88},

   issn={2079-9292},

   doi={10.3390/electronics5040088}

}

Download Download (PDF)   View View   Source Source   

1889

views

Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a complete approach to image classification providing common layers used in neural networks. Namely, the proposed approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. The Compute Unified Device Architecture(CUDA)-based implementation of the proposed approach is evaluated over three different image classification networks on a Tegra K1 CPU-GPU mobile processor. Experiments show that the presented heterogeneous scheme boasts a 50x speedup over the CPU-only reference and outperforms a GPU-based reference by 2x, while slashing the power consumption by nearly 30%.
Rating: 2.2/5. From 5 votes.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: