high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Fei Xu, Jianian Xu, Jiabin Chen, Li Chen, Ruitao Shang, Zhi Zhou, Fangming Liu

Shanghai Key Laboratory of Multidimensional Information Processing, School of Computer Science and Technology, East China Normal University, 3663 N. Zhongshan Road, Shanghai 200062, China

arXiv:2211.01713 [cs.DC], (3 Nov 2022)

DOI:10.48550/arXiv.2211.01713

BibTeX

Download (PDF)

View

Source

Source codes

Package:

iGniter: an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud

1006

views

GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling. However, GPU sharing inevitably brings severe performance interference among co-located inference workloads, as motivated by an empirical measurement study of DNN inference on EC2 GPU instances. While existing works on guaranteeing inference performance service level objectives (SLOs) focus on either temporal sharing of GPUs or reactive GPU resource scaling and inference migration techniques, how to proactively mitigate such severe performance interference has received comparatively little attention. In this paper, we propose iGniter, an interference-aware GPU resource provisioning framework for cost-efficiently achieving predictable DNN inference in the cloud. iGniter is comprised of two key components: (1) a lightweight DNN inference performance model, which leverages the system and workload metrics that are practically accessible to capture the performance interference; (2) A cost-efficient GPU resource provisioning strategy that jointly optimizes the GPU resource allocation and adaptive batching based on our inference performance model, with the aim of achieving predictable performance of DNN inference workloads. We implement a prototype of iGniter based on the NVIDIA Triton inference server hosted on EC2 GPU instances. Extensive prototype experiments on four representative DNN models and datasets demonstrate that iGniter can guarantee the performance SLOs of DNN inference workloads with practically acceptable runtime overhead, while saving the monetary cost by up to 25% in comparison to the state-of-the-art GPU resource provisioning strategies.

Tags: Cloud, Computer science, CUDA, Deep learning, Neural networks, nVidia, Package, Tesla V100

November 13, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)