30354

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

Kelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian
School of Computer Science and Engineering, Beihang University, Beijing, China
arXiv:2511.06345 [cs.DC], (9 Nov 2025)

@misc{lei2025pragmaprofilingreasonedmultiagentframework,

   title={PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization},

   author={Kelun Lei and Hailong Yang and Huaitao Zhang and Xin You and Kaige Zhang and Zhongzhi Luan and Yi Liu and Depei Qian},

   year={2025},

   eprint={2511.06345},

   archivePrefix={arXiv},

   primaryClass={cs.DC},

   url={https://arxiv.org/abs/2511.06345}

}

Download Download (PDF)   View View   Source Source   

237

views

Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel generation, yet most existing systems rely solely on correctness or execution time feedback, lacking the ability to reason about low-level performance bottlenecks. In this paper, we introduce PRAGMA, a profile-guided AI kernel generation framework that integrates execution feedback and fine-grained hardware profiling into the reasoning loop. PRAGMA enables LLMs to identify performance bottlenecks, preserve historical best versions, and iteratively refine code quality. We evaluate PRAGMA on KernelBench, covering GPU and CPU backends. Results show that PRAGMA consistently outperforms baseline AIKG without profiling enabled and achieves 2.81 and 2.30 averaged speedups against Torch on CPU and GPU platforms, respectively.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: