Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core

hgpu.org » Applications » Computer science » Computer vision » Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core

Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core

Gaurav Raina

Eindhoven University of Technology

Eindhoven University of Technology, 2016

@article{raina2016deep,

title={Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core},

author={RAINA, Gaurav and Corporaal, Henk and Cuijpers, Pieter and Peemen, Maurice and Rauwerda, Gerard},

year={2016},

publisher={Technische Universiteit Eindhoven}

}

Download (PDF)

View

Source

2048

views

With a sharp decline in camera cost and size along with superior computing power available at increasingly low prices, computer vision applications are becoming ever present in our daily lives. Research shows that Convolutional Neural Networks (ConvNet) can outperform all other methods for computer vision tasks (such as object detection) in terms of accuracy and versatility [31]. One of the problems with these Neural Networks, which mimic the brain, is that they can be very demanding on the processor, requiring millions of computational nodes to function. Hence, it is challenging for Neural Network algorithms to achieve real-time performance on general purpose embedded platforms. Parallelization and vectorization are very effective ways to ease this problem and make it possible to implement such ConvNets on energy efficient embedded platforms. This thesis presents the evaluation of a novel ConvNet for road speed sign detection [38], on a breakthrough 57-core Intel Xeon Phi processor with 512-bit vector support. This mapping demonstrates that the parallelism inherent in the ConvNet algorithm can be effectively exploited by the 512-bit vector ISA and by utilizing the many core paradigm. Detailed evaluation shows that the best mappings require data-reuse strategies that exploit reuse at the cache and register level. These implementations are boosted by the use of low-level vector intrinsics (which are C style functions that map directly onto many Intel assembly instructions). Ultimately we demonstrate an approach which can be used to accelerate Neural Networks on highly-parallel many core processors, with execution speedups of more than 12x on single core performance alone.

Tags: Computer science, Computer vision, Deep learning, Intel Xeon Phi, Neural networks, OpenMP, Thesis

January 26, 2017 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org