PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

hgpu.org » Applications » Computer science » PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

Muhammad Usman Tariq, Abhinav Jangda, Angelica Moreira, Madan Musuvathi, Tyler Sorensen

Stanford University, USA

arXiv:2512.19018 [cs.SE], (22 Dec 2025)

DOI:10.48550/arXiv.2512.19018

@misc{tariq2025peakperformanceengineeringaiassistant,

title={PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations},

author={Muhammad Usman Tariq and Abhinav Jangda and Angelica Moreira and Madan Musuvathi and Tyler Sorensen},

year={2025},

eprint={2512.19018},

archivePrefix={arXiv},

primaryClass={cs.SE},

url={https://arxiv.org/abs/2512.19018}

}

Download (PDF)

View

Source

1067

views

Advancements in large language models (LLMs) are showing promising impact in software development and programming assistance. However, these models struggle when operating on low-level backend code. This challenge is exacerbated in the domain of GPU kernels, where performance-critical details are coupled to rapidly evolving hardware characteristics and available code examples are sparse. In this work, we introduce PEAK, a Performance Engineering AI-Assistant for GPU Kernels powered by natural language transformations. PEAK utilizes the key insight that iterative code transformations (optimizations) can straightforwardly be written in natural language, and then carried out by LLMs. Thus, these transformations can be rapidly developed, encoding general portable optimizations, but also easily specialized to specific GPU devices and even kernels. These natural transformations are supported by a modular and extensible infrastructure that additionally performs validation and performance evaluation. We demonstrate the flexibility of PEAK by instantiating it for three backends, CUDA, HIP, and HLSL, and create 16 natural transformations for optimizing matrix multiplication kernels. We show that our resulting implementations are competitive with vendor libraries when available, and for HLSL (without a library) our implementations match the hardware documented FLOPS. PEAK allows the fine-grained exploration of several research questions around how LLMs behave in this domain, including characterizing transformations and their errors; and how performance evolves along optimization sequences. PEAK provides an interface that can either be utilized by performance engineers to improve productivity, or driven completely autonomously (e.g., by an AI agent), providing a forward-compatible design that can continue to improve with advances in AI capabilities.

Tags: AI, AMD, AMD Radeon Instinct MI200, ATI, Computer science, CUDA, HIP, HLSL, LLM, NLP, nVidia, nVidia RTX A6000

December 29, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

high performance computing on graphics processing units: hgpu.org