high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Guilin Zhang, Wulan Guo, Ziqi Tan, Qiang Guan, Hailong Jiang

Department of Engineering Management and Systems Engineering, George Washington University, USA

arXiv:2507.07932 [cs.DC], (10 Jul 2025)

DOI:10.48550/arXiv.2507.07932

@misc{zhang2025kissgpuawarekubernetesinference,

title={KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling},

author={Guilin Zhang and Wulan Guo and Ziqi Tan and Qiang Guan and Hailong Jiang},

year={2025},

eprint={2507.07932},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2507.07932}

}

Download (PDF)

View

Source

Source codes

Package:

KISim: Kubernetes Intelligent Scheduling Simulator

813

views

Autoscaling GPU inference workloads in Kubernetes remains challenging due to the reactive and threshold-based nature of default mechanisms such as the Horizontal Pod Autoscaler (HPA), which struggle under dynamic and bursty traffic patterns and lack integration with GPU-level metrics. We present KIS-S, a unified framework that combines KISim, a GPU-aware Kubernetes Inference Simulator, with KIScaler, a Proximal Policy Optimization (PPO)-based autoscaler. KIScaler learns latency-aware and resource-efficient scaling policies entirely in simulation, and is directly deployed without retraining. Experiments across four traffic patterns show that KIScaler improves average reward by 75.2%, reduces P95 latency up to 6.7x over CPU baselines, and generalizes without retraining. Our work bridges the gap between reactive autoscaling and intelligent orchestration for scalable GPU-accelerated environments.

Tags: Computer science, nVidia, nVidia GeForce RTX 3080, Package, Task scheduling

July 13, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Package:

Your response

Recent source codes

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

Most viewed papers (last 30 days)

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)