high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scaling On-Device GPU Inference for Large Generative Models

Scaling On-Device GPU Inference for Large Generative Models

Jiuqiang Tang, Raman Sarokin, Ekaterina Ignasheva, Grant Jensen, Lin Chen, Juhyun Lee, Andrei Kulik, Matthias Grundmann

Google LLC

arXiv:2505.00232 [cs.LG], (1 May 2025)

DOI:10.48550/arXiv.2505.00232

@misc{tang2025scalingondevicegpuinference,

title={Scaling On-Device GPU Inference for Large Generative Models},

author={Jiuqiang Tang and Raman Sarokin and Ekaterina Ignasheva and Grant Jensen and Lin Chen and Juhyun Lee and Andrei Kulik and Matthias Grundmann},

year={2025},

eprint={2505.00232},

archivePrefix={arXiv},

primaryClass={cs.LG},

url={https://arxiv.org/abs/2505.00232}

}

Download (PDF)

View

Source

1357

views

Driven by the advancements in generative AI, large machine learning models have revolutionized domains such as image processing, audio synthesis, and speech recognition. While server-based deployments remain the locus of peak performance, the imperative for on-device inference, necessitated by privacy and efficiency considerations, persists. Recognizing GPUs as the on-device ML accelerator with the widest reach, we present ML Drift–an optimized framework that extends the capabilities of state-of-the-art GPU-accelerated inference engines. ML Drift enables on-device execution of generative AI workloads which contain 10 to 100x more parameters than existing on-device generative AI models. ML Drift addresses intricate engineering challenges associated with cross-GPU API development, and ensures broad compatibility across mobile and desktop/laptop platforms, thereby facilitating the deployment of significantly more complex models on resource-constrained devices. Our GPU-accelerated ML/AI inference engine achieves an order-of-magnitude performance improvement relative to existing open-source GPU inference engines.

Tags: AI, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, OpenCL

May 4, 2025 by hgpu

No votes yet.

Please wait...