high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing

The Chinese University of Hong Kong

arXiv:2307.04339 [cs.DC], (10 Jul 2023)

DOI:10.48550/arXiv.2307.04339

BibTeX

Download (PDF)

View

Source

873

views

Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show that Miriam can increase system throughput by 92% while only incurring less than 10% latency overhead for critical tasks, compared to state of art baselines.

Tags: Benchmarking, Computer science, CUDA, Deep learning, Neural networks, nVidia, nVidia GeForce RTX 2060, nVidia Jetson TX2

July 16, 2023 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)