CANNA: Neural Network Acceleration using Configurable Approximation on GPGPU

hgpu.org » Applications » Computer science » CANNA: Neural Network Acceleration using Configurable Approximation on GPGPU

CANNA: Neural Network Acceleration using Configurable Approximation on GPGPU

Mohsen Imani, Max Masich, Daniel Peroni, Pushen Wang, Tajana Rosing

CSE Department, UC San Diego, La Jolla, CA 92093, USA

23rd Asia and South Pacific Design Automation Conference (ASPDAC ’18), 2018

@article{imani2018canna,

title={CANNA: Neural Network Acceleration using Configurable Approximation on GPGPU},

author={Imani, Mohsen and Masich, Max and Peroni, Daniel and Wang, Pushen and Rosing, Tajana},

year={2018}

}

Download (PDF)

View

Source

2885

views

Neural networks have been successfully used in many applications. Due to their computational complexity, it is difficult to implement them on embedded devices. Neural networks are inherently approximate and thus can be simplified. In this paper, CANNA proposes a gradual training approximation which adaptively sets the level of hardware approximation depending on the neural network’s internal error, instead of apply uniform hardware approximation. To accelerate inference, CANNA’s layer-based approximation approach selectively relaxes the computation in each layer of neural network, as a function its sensitivity to approximation. For hardware support, we use a configurable floating point unit in Hardware that dynamically identifies inputs which produce the largest approximation error and process them instead in precise mode. We evaluate the accuracy and efficiency of our design by integrating configurable FPUs into AMD’s Southern Island GPU architecture. Our experimental evaluation shows that CANNA achieves up to 4.84x (7.13x) energy savings and 3.22x (4.64x) speedup when training four different neural network applications with 0% (2%) quality loss as compared to the implementation on baseline GPU. During the inference phase, our layer-based approach improves the energy efficiency by 4.42x (6.06x) and results in 2.96x (3.98x) speedup while ensuring 0% (2%) quality loss.

Tags: AMD Radeon HD 7970, ATI, Computer science, Deep learning, Neural networks, OpenCL

April 22, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org