Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

hgpu.org » Programming » Algorithms » Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

Mohsen Imani, Daniel Peroni, Yeseong Kim, Abbas Rahimi, Tajana Rosing

CSE, UC San Diego, La Jolla, CA 92093, USA

ACM/IEEE Design, Automation, and Test in Europe Conference (DATE), 2017

BibTeX

Download (PDF)

View

Source

1984

views

Recently, neural networks have been demonstrated to be effective models for image processing, video segmentation, speech recognition, computer vision and gaming. However, high energy computation and low performance are the primary bottlenecks of running the neural networks. In this paper, we propose an energy/performance-efficient network acceleration technique on General Purpose GPU (GPGPU) architecture which utilizes specialized resistive nearest content addressable memory blocks, called NNCAM, by exploiting computation locality of the learning algorithms. NNCAM stores highly frequent patterns corresponding to neural network operations and searches for the most similar patterns to reuse the computation results. To improve NNCAM computation efficiency and accuracy, we proposed layer-based associative update and selective approximation techniques. The layer-based update improves data locality of NNCAM blocks by filling NNCAM values based on the frequent computation patterns of each neural network layer. To guarantee the appropriate level of computation accuracy while providing maximum energy saving, our design adaptively allocates the neural network operations to either NNCAM or GPGPU floating point units (FPUs). The selective approximation relaxes computation on neural network layers by considering the impact on accuracy. In evaluation, we integrate NNCAM blocks with the modern AMD southern Island GPU architecture. Our experimental evaluation shows that the enhanced GPGPU can result in 68% energy savings and 40% speedup running on four popular convolutional neural networks (CNN), ensuring acceptable < 2% quality loss.

Tags: Algorithms, ATI, ATI Radeon HD 7970, Computer science, Computer vision, Image processing, Neural networks, OpenCL

February 2, 2017 by hgpu

Rating: 1.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org