30479

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun
Department of Computer Science, Stanford University, USA
arXiv:2511.15915 [cs.LG], (19 Nov 2025)

@misc{zhang2025acceloptselfimprovingllmagentic,

   title={AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization},

   author={Genghan Zhang and Shaowei Zhu and Anjiang Wei and Zhenyu Song and Allen Nie and Zhen Jia and Nandita Vijaykumar and Yida Wang and Kunle Olukotun},

   year={2025},

   eprint={2511.15915},

   archivePrefix={arXiv},

   primaryClass={cs.LG},

   url={https://arxiv.org/abs/2511.15915}

}

Download Download (PDF)   View View   Source Source   

472

views

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI accelerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt’s capability improves over time, boosting the average percentage of peak throughput from 49% to 61% on Trainium 1 and from 45% to 59% on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being 26x cheaper.
No votes yet.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: