high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, Yaoxue Zhang

Central South University, Microsoft Research

The 20th ACM International Conference on Mobile Systems, Applications, and Services, 2022

@article{jia2022codl,

title={CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices},

author={Jia, Fucheng and Zhang, Deyu and Cao, Ting and Jiang, Shiqi and Liu, Yunxin and Ren, Ju and Zhang, Yaoxue},

year={2022}

}

Download (PDF)

View

Source

Source codes

Package:

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

1254

views

Concurrent inference execution on heterogeneous processors is critical to improve the performance of increasingly heavy deep learning (DL) models. However, available inference frameworks can only use one processor at a time, or hardly achieve speedup by concurrent execution compared to using one processor. This is due to the challenges to 1) reduce data sharing overhead, and 2) properly partition each operator between processors. By solving the challenges, we propose CoDL, a concurrent DL inference framework for the CPU and GPU on mobile devices. It can fully utilize the heterogeneous processors to accelerate each operator of a model. It integrates two novel techniques: 1) hybrid-type-friendly data sharing, which allows each processor to use its efficient data type for inference. To reduce data sharing overhead, we also propose hybrid-dimension partitioning and operator chain methods; 2) non-linearity- and concurrency-aware latency prediction, which can direct proper operator partitioning by building an extremely light-weight but accurate latency predictor for different processors. Based on the two techniques, we build the end-to-end CoDL inference framework, and evaluate it on different DL models. The results show up to 4.93× speedup and 62.3% energy saving compared with the state-of-the-art concurrent execution system.

Tags: Computer science, Deep learning, Heterogeneous systems, Package, Performance

June 19, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Package:

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)