high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke

Intel Corporation

arXiv:1808.05567 [cs.DC], (16 Aug 2018)

@article{georganas2018anatomy,

title={Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures},

author={Georganas, Evangelos and Avancha, Sasikanth and Banerjee, Kunal and Kalamkar, Dhiraj and Henry, Greg and Pabst, Hans and Heinecke, Alexander},

year={2018},

month={aug},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

libxsmm: Library targeting Intel Architecture for specialized dense and sparse matrix operations, and deep learning primitives

1783

views

Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, and direct convolution primarily targeting GPUs. In this paper, we introduce direct convolution kernels for x86 architectures, in particular for Xeon and XeonPhi systems, which are implemented via a dynamic compilation approach. Our JIT-based implementation shows close to theoretical peak performance, depending on the setting and the CPU architecture at hand. We additionally demonstrate how these JIT-optimized kernels can be integrated into a lightweight multi-node graph execution model. This illustrates that single- and multi-node runs yield high efficiencies and high image-throughputs when executing state-of-the-art image recognition tasks on CPUs.

Tags: Computer science, Deep learning, Intel Xeon Phi, OpenMP, Package, Performance

August 19, 2018 by hgpu

Rating: 3.5/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Package:

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)