16942

Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core

Gaurav Raina
Eindhoven University of Technology
Eindhoven University of Technology, 2016

@article{raina2016deep,

   title={Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core},

   author={RAINA, Gaurav and Corporaal, Henk and Cuijpers, Pieter and Peemen, Maurice and Rauwerda, Gerard},

   year={2016},

   publisher={Technische Universiteit Eindhoven}

}

Download Download (PDF)   View View   Source Source   

564

views

With a sharp decline in camera cost and size along with superior computing power available at increasingly low prices, computer vision applications are becoming ever present in our daily lives. Research shows that Convolutional Neural Networks (ConvNet) can outperform all other methods for computer vision tasks (such as object detection) in terms of accuracy and versatility [31]. One of the problems with these Neural Networks, which mimic the brain, is that they can be very demanding on the processor, requiring millions of computational nodes to function. Hence, it is challenging for Neural Network algorithms to achieve real-time performance on general purpose embedded platforms. Parallelization and vectorization are very effective ways to ease this problem and make it possible to implement such ConvNets on energy efficient embedded platforms. This thesis presents the evaluation of a novel ConvNet for road speed sign detection [38], on a breakthrough 57-core Intel Xeon Phi processor with 512-bit vector support. This mapping demonstrates that the parallelism inherent in the ConvNet algorithm can be effectively exploited by the 512-bit vector ISA and by utilizing the many core paradigm. Detailed evaluation shows that the best mappings require data-reuse strategies that exploit reuse at the cache and register level. These implementations are boosted by the use of low-level vector intrinsics (which are C style functions that map directly onto many Intel assembly instructions). Ultimately we demonstrate an approach which can be used to accelerate Neural Networks on highly-parallel many core processors, with execution speedups of more than 12x on single core performance alone.
Rating: 1.5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: