high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, Dongjun Shin

Software R&D Center, Samsung Electronics, Yongin, Korea

arXiv:1511.06530 [cs.CV], (20 Nov 2015)

BibTeX

Download (PDF)

View

Source

3125

views

Although the latest high-end smartphone has powerful CPU and GPU, running deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet classification on mobile devices is challenging. To deploy deep CNNs on mobile devices, we present a simple and effective scheme to compress the entire CNN, which we call one-shot whole network compression. The proposed scheme consists of three steps: (1) rank selection with variational Bayesian matrix factorization, (2) Tucker decomposition on kernel tensor, and (3) fine-tuning to recover accumulated loss of accuracy, and each step can be easily implemented using publicly available tools. We demonstrate the effectiveness of the proposed scheme by testing the performance of various compressed CNNs (AlexNet, VGGS, GoogLeNet, and VGG-16) on the smartphone. Significant reductions in model size, runtime, and energy consumption are obtained, at the cost of small loss in accuracy. In addition, we address the important implementation level issue on 1×1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by our proposed scheme.

Tags: ARM, Bayesian, Caffe, CNN, Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia GeForce GTX Titan X

November 24, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)