high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

P. Judd, J. Albericio, N. Enright Jerger, A. Moshovos, T. Hetherington, T. Aamodt

Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada

2nd Workshop On Approximate Computing (WAPCO), 2016

BibTeX

Download (PDF)

View

Source

2494

views

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their ability to use different representations per layer while maintaining accuracy. This flexibility provides an additional opportunity to improve performance and energy compared to conventional DNN implementations that use a single, uniform representation for all layers throughout the network. This work exploits this property by proposing PROTEUS, a layered extension over existing DNN implementations that converts between the numerical representation used by the DNN execution engines and a shorter, layer specific fixed-point representation when reading and writing data values to memory be it on-chip buffers or off-chip memory. When used with a modified layout of data in memory, PROTEUS can use a simple, low-cost and low energy conversion unit. On five popular DNNs, PROTEUS can reduce data traffic among layers by 41% on average and up to 44% compared to a baseline that uses 16-bit fixed-point representation, while maintaining accuracy within 1% even when compared to a single precision floating-point implementation. When incorporated into a state-of-the-art accelerator PROTEUS improves energy by 14% While maintaining the same performance. When incorporated on a graphics processor PROTEUS improves performance by 1%, energy by 4% and reduces off-chip DRAM accesses by 46%.

Tags: Computer science, Deep learning, GPGPU-sim, Neural networks

March 3, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)