high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Leyuan Wang, Zhi Chen, Yizhi Liu, Yao Wang, Lianmin Zheng, Mu Li, Yida Wang

Amazon Web Services, East Palo Alto, CA, USA

arXiv:1907.02154 [cs.DC], (3 Jul 2019)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

tvm: Open deep learning compiler stack for cpu, gpu and specialized accelerators

2200

views

Modern deep learning applications urge to push the model inference taking place at the edge devices for multiple reasons such as achieving shorter latency, relieving the burden of the network connecting to the cloud, and protecting user privacy. The Convolutional Neural Network (CNN) is one of the most widely used model family in the applications. Given the high computational complexity of the CNN models, it is favorable to execute them on the integrated GPUs at the edge devices, which are ubiquitous and have more power and better energy efficiency than the accompanying CPUs. However, programming on integrated GPUs efficiently is challenging due to the variety of their architectures and programming interfaces. This paper proposes an end-to-end solution to execute CNN model inference on the integrated GPUs at the edge, which uses a unified IR to represent and optimize vision-specific operators on integrated GPUs from multiple vendors, as well as leverages machine learning-based scheduling search schemes to optimize computationally-intensive operators like convolution. Our solution even provides a fallback mechanism for operators not suitable or convenient to run on GPUs. The evaluation results suggest that compared to state-of-the-art solutions backed up by the vendor-provided high-performance libraries on Intel Graphics, ARM Mali GPU, and Nvidia integrated Maxwell GPU, our solution achieves similar, or even better (up to 1.62x), performance on a number of popular image classification and object detection models. In addition, our solution has a wider model coverage and is more flexible to embrace new models. Our solution has been adopted in production services in AWS and is open-sourced.

Tags: ARM, CNN, Compilers, Computer science, Deep learning, Machine learning, Neural networks, nVidia, nVidia Jetson Nano, OpenCL, Package

July 7, 2019 by hgpu

Rating: 3.5/5. From 2 votes.

Please wait...

high performance computing on graphics processing units: hgpu.org

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Package:

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)