high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, Eric S. Chung

Microsoft Research

Microsoft Research, 2015

BibTeX

Download (PDF)

View

Source

3422

views

Recent breakthroughs in the development of multi-layer convolutional neural networks have led to stateof-the-art improvements in the accuracy of non-trivial recognition tasks such as large-category image classification and automatic speech recognition [1]. These many-layered neural networks are large, complex, and require substantial computing resources to train and evaluate [2]. Unfortunately, these demands come at an inopportune moment due to the recent slowing of gains in commodity processor performance. Hardware specialization in the form of GPGPUs, FPGAs, and ASICs1 offers a promising path towards major leaps in processing capability while achieving high energy efficiency. To harness specialization, an effort is underway at Microsoft to accelerate Deep Convolutional Neural Networks (CNN) using servers augmented with FPGAs-similar to the hardware that is being integrated into some of Microsoft’s datacenters [3]. Initial efforts to implement a single-node CNN accelerator on a mid-range FPGA show significant promise, resulting in respectable performance relative to prior FPGA designs and high-end GPGPUs, at a fraction of the power. In the future, combining multiple FPGAs over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality.

Tags: Computer science, FPGA, Machine learning, Neural networks, Speech recognition

February 27, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Your response

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)