28436

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing
The Chinese University of Hong Kong
arXiv:2307.04339 [cs.DC], (10 Jul 2023)

@misc{zhao2023miriam,

   title={Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU},

   author={Zhihe Zhao and Neiwen Ling and Nan Guan and Guoliang Xing},

   year={2023},

   eprint={2307.04339},

   archivePrefix={arXiv},

   primaryClass={cs.DC}

}

Download Download (PDF)   View View   Source Source   

583

views

Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show that Miriam can increase system throughput by 92% while only incurring less than 10% latency overhead for critical tasks, compared to state of art baselines.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: