Enabling On-Device Smartphone GPU based Training: Lessons Learned

hgpu.org » Applications » Computer science » Enabling On-Device Smartphone GPU based Training: Lessons Learned

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Anish Das, Young D. Kwon, Jagmohan Chauhan, Cecilia Mascolo

University of Cambridge

arXiv:2202.10100 [cs.LG], (21 Feb 2022)

DOI:10.48550/arXiv.2202.10100

BibTeX

Download (PDF)

View

Source

1181

views

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained mobile devices. However, the other aspect of DNN operations, i.e. training (forward and backward passes) on smartphone GPUs, has received little attention thus far. To this end, we conduct an initial analysis to examine the feasibility of on-device training on smartphones using mobile GPUs. We first employ the open-source mobile DL framework (MNN) and its OpenCL backend for running compute kernels on GPUs. Next, we observed that training on CPUs is much faster than on GPUs and identified two possible bottlenecks related to this observation: (i) computation and (ii) memory bottlenecks. To solve the computation bottleneck, we optimize the OpenCL backend’s kernels, showing 2x improvements (40-70 GFLOPs) over CPUs (15-30 GFLOPs) on the Snapdragon 8 series processors. However, we find that the full DNN training is still much slower on GPUs than on CPUs, indicating that memory bottleneck plays a significant role in the lower performance of GPU over CPU. The data movement takes almost 91% of training time due to the low bandwidth. Lastly, based on the findings and failures during our investigation, we present limitations and practical guidelines for future directions.

Tags: Computer science, Deep learning, Neural networks, OpenCL

March 6, 2022 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Share this:

Recent source codes

Most viewed papers (last 30 days)