high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CPU-GPU Layer-Switched Low Latency CNN Inference

CPU-GPU Layer-Switched Low Latency CNN Inference

Ehsan Aghapour, Dolly Sapra, Andy Pimentel, and Anuj Pathania

University of Amsterdam

25th Euromicro Conference on Digital System Design (DSD), 2022

BibTeX

Download (PDF)

View

Source

Source codes

Package:

ARMCL-PipeALL: an integrated high-throughput CPU-GPU CNN inference pipeline design for ARM-based Heterogeneous MPSoCs

1062

views

Convolutional Neural Networks (CNNs) inference on Heterogeneous Multi-Processor System-on-Chips (HMPSoCs) in edge devices represent cutting-edge embedded machine learning. Embedded CPU and GPU within an HMPSoC can both perform inference using CNNs. However, common practice is to run a CNN on the HMPSoC component (CPU or GPU) provides the best performance (lowest latency) for that CNN. CNNs are not monolithic and are composed of several layers of different types. Some of these layers have lower latency on the CPU, while others execute faster on the GPU. In this work, we investigate the reason behind this observation. We also propose an execution of CNN that switches between CPU and GPU at the layer granularity, wherein a CNN layer executes on the component that provides it with the lowest latency. Switching between the CPU and the GPU back and forth mid-inference introduces additional overhead (delay) in the inference. Regardless of overhead, we show in this work that a CPU-GPU layer switched execution results in, on average, having 4.72% lower CNN inference latency on the Khadas VIM 3 board with Amlogic A311D HMPSoC.

Tags: ARM, Computer science, Heterogeneous systems, Machine learning, Neural networks, OpenCL, Package, SoC

July 24, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

CPU-GPU Layer-Switched Low Latency CNN Inference

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

CPU-GPU Layer-Switched Low Latency CNN Inference

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)