AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis

hgpu.org » Applications » Computer science » AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis

AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis

Jinye Du, Quan Yuan, Zuyao Zhang, Yanzhi Yi, Jiahui Hu, Wangyi Chen, Yiyang Zhu, Qishui Zheng, Wenxiang Zou, Xiangyu Chang, Zuohe Zheng, Zichun Ye, Chao Liu, Shanni Li, Renwei Zhang, Yiping Deng, Xinwei Hu, Xuefeng Jin, Jie Zhao

Huawei Technologies Co., Ltd.

arXiv:2512.23424 [cs.AI], (29 Dec 2025)

DOI:10.48550/arXiv.2512.23424

@misc{du2025akgkernelagentmultiagent,

title={AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis},

author={Jinye Du and Quan Yuan and Zuyao Zhang and Yanzhi Yi and Jiahui Hu and Wangyi Chen and Yiyang Zhu and Qishui Zheng and Wenxiang Zou and Xiangyu Chang and Zuohe Zheng and Zichun Ye and Chao Liu and Shanni Li and Renwei Zhang and Yiping Deng and Xinwei Hu and Xuefeng Jin and Jie Zhao},

year={2025},

eprint={2512.23424},

archivePrefix={arXiv},

primaryClass={cs.AI},

url={https://arxiv.org/abs/2512.23424}

}

Download (PDF)

View

Source

633

views

Modern AI models demand high-performance computation kernels. The growing complexity of LLMs, multimodal architectures, and recommendation systems, combined with techniques like sparsity and quantization, creates significant computational challenges. Moreover, frequent hardware updates and diverse chip architectures further complicate this landscape, requiring tailored kernel implementations for each platform. However, manual optimization cannot keep pace with these demands, creating a critical bottleneck in AI system development. Recent advances in LLM code generation capabilities have opened new possibilities for automating kernel development. In this work, we propose AKG kernel agent (AI-driven Kernel Generator), a multi-agent system that automates kernel generation, migration, and performance tuning. AKG kernel agent is designed to support multiple domain-specific languages (DSLs), including Triton, TileLang, CPP, and CUDA-C, enabling it to target different hardware backends while maintaining correctness and portability. The system’s modular design allows rapid integration of new DSLs and hardware targets. When evaluated on KernelBench using Triton DSL across GPU and NPU backends, AKG kernel agent achieves an average speedup of 1.46 over PyTorch Eager baselines implementations, demonstrating its effectiveness in accelerating kernel development for modern AI workloads.

Tags: Code generation, Computer science, CUDA, DSL, LLM, nVidia, nVidia A100, Triton

January 12, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org