30055

Kevin: Multi-Turn RL for Generating CUDA Kernels

Carlo Baronio, Pietro Marsella, Ben Pan, Simon Guo, Silas Alberti
Stanford University, Cognition AI
arXiv:2507.11948 [cs.LG], (16 Jul 2025)

@misc{baronio2025kevinmultiturnrlgenerating,

   title={Kevin: Multi-Turn RL for Generating CUDA Kernels},

   author={Carlo Baronio and Pietro Marsella and Ben Pan and Simon Guo and Silas Alberti},

   year={2025},

   eprint={2507.11948},

   archivePrefix={arXiv},

   primaryClass={cs.LG},

   url={https://arxiv.org/abs/2507.11948}

}

Download Download (PDF)   View View   Source Source   

952

views

Writing GPU kernels is a challenging task and critical for AI systems’ efficiency. It is also highly iterative: domain experts write code and improve performance through execution feedback. Moreover, it presents verifiable rewards like correctness and speedup, making it a natural environment to apply Reinforcement Learning (RL). To explicitly incorporate the iterative nature of this process into training, we develop a flexible multi-turn RL recipe that addresses unique challenges encountered in real-world settings, such as learning from long trajectories and effective reward attribution across turns. We present Kevin – K(ernel D)evin, the first model trained with multi-turn RL for CUDA kernel generation and optimization. In our evaluation setup, Kevin shows significant gains over its base model (QwQ-32B), improving correctness of generated kernels (in pure CUDA) from 56% to 82% and mean speedup from 0.53x to 1.10x of baseline (PyTorch Eager), and surpassing frontier models like o4-mini (0.78x). Finally, we study its behavior across test-time scaling axes: we found scaling serial refinement more beneficial than parallel sampling. In particular, when given more refinement turns, Kevin shows a higher rate of improvement.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: