The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

hgpu.org » Applications » Computer science » The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

Robert Tjarko Lange, Aaditya Prasad, Qi Sun, Maxence Faldor, Yujin Tang, David Ha

Sakana AI

Sakana AI, 2025

BibTeX

Download (PDF)

View

Source

Source codes

Package:

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

1175

views

Recent advances in Large Language Models have driven large-scale deployment, resulting in ever-growing inference time and energy demand. While manual optimization of low-level code implementations is feasible, it is an arduous task that requires deep expertise to balance the complex interplay of algorithmic, software, and hardware bottlenecks. This report presents the first comprehensive agentic framework for fully automatic CUDA kernel discovery and optimization, enabling frontier large language models to perform the translation of torch code to CUDA kernels and then iteratively improve their runtime. We introduce The AI CUDA Engineer, which acts in sequential stages. First, it translates raw PyTorch code into equivalent CUDA kernels. Next, it optimizes their runtime performance using a novel evolutionary meta-generation procedure tailored towards the CUDA ecosystem. Finally, it uses an innovation archive of discovered ’stepping stone’ kernels to improve future performance on new tasks. The AI CUDA Engineer can produce CUDA kernels that exceed the performance of torch native and compiled kernels. Out of the 250 tasks tested, The AI CUDA Engineer successfully optimizes 186 tasks to a median speedup of 1.52x. For operations such as fused 3D convolutions or Diagonal Matrix Multiplication, we show runtime improvements ≥50x over their torch implementations. Alongside this report, we release the best discovered kernels, an accompanying dataset of all discovered kernels and an interactive webpage for exploration of the results.

Tags: AI, Computer science, CUDA, LLM, nVidia, nVidia H100, Package, Performance

February 24, 2025 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org