high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Zhuoran Ji

The University of Hong Kong, Hong Kong, China

arXiv:1901.00858 [cs.LG], (3 Jan 2019)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

caffe-android-opencl-fp16: Optimised Caffe with OpenCL supporting for less powerful devices such as mobile phones

1997

views

Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep neural network inference engine named HG-Caffe, which supports GPUs with half precision. HG-Caffe provides up to 20 times speedup with GPUs compared to the original implementations. In addition to the speedup, the peak memory usage is also reduced to about 80%. With HG-Caffe, more innovative and fascinating mobile applications will be turned into reality.

Tags: Caffe, Computer science, Deep learning, Machine learning, Neural networks, OpenCL, Package

January 13, 2019 by hgpu

Rating: 1.0/5. From 1 vote.

Please wait...

high performance computing on graphics processing units: hgpu.org

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Package:

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)