high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

Northwestern Polytechnical University

arXiv:2405.01851 [cs.LG], (3 May 2024)

DOI:10.48550/arXiv.2405.01851

BibTeX

Download (PDF)

View

Source

1645

views

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.

Tags: Computer science, Deep learning, Heterogeneous systems

May 12, 2024 by hgpu

No votes yet.

Please wait...

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Recent source codes

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

Most viewed papers (last 30 days)

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Share this:

Recent source codes

Most viewed papers (last 30 days)