Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

hgpu.org » Applications » Computer science » Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum

Advanced Micro Devices, Inc. (AMD)

arXiv:2507.23194 [cs.CL], (31 Jul 2025)

DOI:10.48550/arXiv.2507.23194

@misc{wang2025geakintroducingtritonkernel,

title={Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks},

author={Jianghui Wang and Vinay Joshi and Saptarshi Majumder and Xu Chao and Bin Ding and Ziqiong Liu and Pratik Prabhanjan Brahma and Dong Li and Zicheng Liu and Emad Barsoum},

year={2025},

eprint={2507.23194},

archivePrefix={arXiv},

primaryClass={cs.CL},

url={https://arxiv.org/abs/2507.23194}

}

Download (PDF)

View

Source

Source codes

Package:

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

2929

views

The demand for AI-generated GPU kernels is rapidly growing, influenced by the need for scalable, hardware-optimized solutions in both industry and academia. As deep learning workloads grow in complexity and diversity, it is imperative to automate low-level kernel development to meet performance and productivity demands. Major cloud providers, semiconductor companies, and research institutions are now investing heavily in AI-driven code generation for GPUs, aiming to reduce manual optimization efforts while achieving near-expert performance on hardware like AMD MI300X. The Triton language, a Python-based DSL for GPU programming, has emerged as a popular target for such AI-generated kernels due to its balance of performance and ease-of-coding. In this work, we present an evaluation suite for Triton-based GPU kernels and GEAK (Generating Efficient AI-centric GPU Kernels)-a framework that leverages cutting-edge LLMs to generate performant Triton code specifically for AMD GPUs, including the AMD MI300X and MI250. GEAK leverages inference-time compute scaling to produce Triton-based GPU kernels using a reasoning loop adapted from Reflexion-style feedback mechanisms. On two evaluation benchmarks, GEAK significantly outperformed the baselines of directly prompting frontier LLMs as well as Reflexion-based generation pipelines by achieving correctness up to 63% and execution speed up of up to 2.59X. These results highlight the promise of GEAK-like agentic code generation for accelerating the adoption of diverse hardware platforms and democratizing access to expert-level kernel performance.

Tags: AMD Radeon Instinct MI300X, ATI, Benchmarking, Code generation, Computer science, Deep learning, Package, Python, ROCm, Triton

August 3, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org