30477

PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

Muhammad Usman Tariq, Abhinav Jangda, Angelica Moreira, Madan Musuvathi, Tyler Sorensen
Stanford University, USA
arXiv:2512.19018 [cs.SE], (22 Dec 2025)

@misc{tariq2025peakperformanceengineeringaiassistant,

   title={PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations},

   author={Muhammad Usman Tariq and Abhinav Jangda and Angelica Moreira and Madan Musuvathi and Tyler Sorensen},

   year={2025},

   eprint={2512.19018},

   archivePrefix={arXiv},

   primaryClass={cs.SE},

   url={https://arxiv.org/abs/2512.19018}

}

Download Download (PDF)   View View   Source Source   

483

views

Advancements in large language models (LLMs) are showing promising impact in software development and programming assistance. However, these models struggle when operating on low-level backend code. This challenge is exacerbated in the domain of GPU kernels, where performance-critical details are coupled to rapidly evolving hardware characteristics and available code examples are sparse. In this work, we introduce PEAK, a Performance Engineering AI-Assistant for GPU Kernels powered by natural language transformations. PEAK utilizes the key insight that iterative code transformations (optimizations) can straightforwardly be written in natural language, and then carried out by LLMs. Thus, these transformations can be rapidly developed, encoding general portable optimizations, but also easily specialized to specific GPU devices and even kernels. These natural transformations are supported by a modular and extensible infrastructure that additionally performs validation and performance evaluation. We demonstrate the flexibility of PEAK by instantiating it for three backends, CUDA, HIP, and HLSL, and create 16 natural transformations for optimizing matrix multiplication kernels. We show that our resulting implementations are competitive with vendor libraries when available, and for HLSL (without a library) our implementations match the hardware documented FLOPS. PEAK allows the fine-grained exploration of several research questions around how LLMs behave in this domain, including characterizing transformations and their errors; and how performance evolves along optimization sequences. PEAK provides an interface that can either be utilized by performance engineers to improve productivity, or driven completely autonomously (e.g., by an AI agent), providing a forward-compatible design that can continue to improve with advances in AI capabilities.
No votes yet.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: