high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

Qiyang Zhang, Xiang Li, Xiangying Che, Xiao Ma, Ao Zhou, Mengwei Xu, Shangguang Wang, Yun Ma, Xuanzhe Liu

Beijing University of Posts and Telecommunications

arXiv:2202.06512 [cs.LG], (14 Feb 2022)

DOI:10.48550/arXiv.2202.06512

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Mobile-DL-benchmark: A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

1640

views

Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libs and provides quantitative results on their performance. In this paper, we first build a comprehensive benchmark that includes 6 representative DL libs and 15 diversified DL models. We then perform extensive experiments on 10 mobile devices, which help reveal a complete landscape of the current mobile DL libs ecosystem. For example, we find that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge. In fact, the impacts of DL libs can overwhelm the optimizations from algorithms or hardware, e.g., model quantization and GPU/DSP-based heterogeneous computing. Finally, atop the observations, we summarize practical implications to different roles in the DL lib ecosystem.

Tags: ARM, Benchmarking, Computer science, Deep learning, OpenCL, OpenGL, Vulkan

February 20, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)