high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

Kelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian

School of Computer Science and Engineering, Beihang University, Beijing, China

arXiv:2511.06345 [cs.DC], (9 Nov 2025)

DOI:10.48550/arXiv.2511.06345

@misc{lei2025pragmaprofilingreasonedmultiagentframework,

title={PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization},

author={Kelun Lei and Hailong Yang and Huaitao Zhang and Xin You and Kaige Zhang and Zhongzhi Luan and Yi Liu and Depei Qian},

year={2025},

eprint={2511.06345},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2511.06345}

}

Download (PDF)

View

Source

798

views

Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel generation, yet most existing systems rely solely on correctness or execution time feedback, lacking the ability to reason about low-level performance bottlenecks. In this paper, we introduce PRAGMA, a profile-guided AI kernel generation framework that integrates execution feedback and fine-grained hardware profiling into the reasoning loop. PRAGMA enables LLMs to identify performance bottlenecks, preserve historical best versions, and iteratively refine code quality. We evaluate PRAGMA on KernelBench, covering GPU and CPU backends. Results show that PRAGMA consistently outperforms baseline AIKG without profiling enabled and achieves 2.81 and 2.30 averaged speedups against Torch on CPU and GPU platforms, respectively.

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, Performance

November 16, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)