high performance computing on graphics processing units: hgpu.org

hgpu.org » Code generation

AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis

Jinye Du, Quan Yuan, Zuyao Zhang, Yanzhi Yi, Jiahui Hu, Wangyi Chen, Yiyang Zhu, Qishui Zheng, Wenxiang Zou, Xiangyu Chang, Zuohe Zheng, Zichun Ye, Chao Liu, Shanni Li, Renwei Zhang, Yiping Deng, Xinwei Hu, Xuefeng Jin, Jie Zhao

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, DSL, LLM, nVidia, nVidia A100, Triton

January 12, 2026 by hgpu

DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation

Jiajun jiao, Haowei Zhu, Puyuan Yang, Jianghui Wang, Ji Liu, Ziqiong Liu, Dong Li, Yuejian Fang, Junhai Yong, Bin Wang, Emad Barsoum

View

Download (PDF)

Tags: Benchmarking, Code generation, Computer science, LLM

January 12, 2026 by hgpu

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Erel Kaplan, Tomer Bitan, Lian Ghrayeb, Le Chen, Tom Yotam, Niranjan Hasabnis, Gal Oren

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia GeForce RTX 4060, OpenMP, Package

January 12, 2026 by hgpu

Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation

Le Chen1, Nuo Xu, Winson Chen, Bin Lei, Pei-Hung Lin, Dunzhi Zhou, Rajeev Thakur, Caiwen Ding, Ali Jannesari, Chunhua Liao

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia H200

December 21, 2025 by hgpu

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

Jinwu Chen, Qidie Wu, Bin Li, Lin Ma, Xin Si, Yang Hu, Shouyi Yin, Jun Yang

View

Download (PDF)

Source codes

Tags: AI, Code generation, Computer science, CUDA, nVidia, nVidia A100, Package, PyTorch

December 21, 2025 by hgpu

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

Xinguo Zhu, Shaohui Peng, Jiaming Guo, Yunji Chen, Qi Guo, Yuanbo Wen, Hang Qin, Ruizhi Chen, Qirui Zhou, Ke Gao, Yanjun Wu, Chen Zhao, Ling Li

View

Download (PDF)

Tags: AI, Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, Triton

November 30, 2025 by hgpu

KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

Dezhi Ran, Shuxiao Xie, Mingfang Ji, Ziyue Hua, Mengzhou Wu, Yuan Cao, Yuzhe Guo, Yu Hao, Linyi Li, Yitao Hu, Tao Xie

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, nVidia GeForce RTX 4090, Triton

November 30, 2025 by hgpu

ProofWright: Towards Agentic Formal Verification of CUDA

Bodhisatwa Chatterjee, Drew Zagieboylo, Sana Damani, Siva Hari, Christos Kozyrakis

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, LLM, nVidia

November 23, 2025 by hgpu

Inside VOLT: Designing an Open-Source GPU Compiler

Shinnung Jeong, Chihyo Ahn, Huanzhi Pu, Jisheng Zhao, Hyesoon Kim, Blaise Tine

View

Download (PDF)

Tags: Code generation, Compilers, Computer science, CUDA, nVidia, OpenCL

November 23, 2025 by hgpu

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

Kelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, Performance

November 16, 2025 by hgpu

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Zijian Zhang, Rong Wang, Shiyang Li, Yuebo Luo, Mingyi Hong, Caiwen Ding

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3090, nVidia GeForce RTX 4090, nVidia RTX 6000 Ada, Package, Performance

November 9, 2025 by hgpu

A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines

Jonathan Strobl, Leonardo Solis-Vasquez, Yannick Lavan, Andreas Koch

View

Download (PDF)

Tags: AI, AMD, Code generation, Computer science, FPGA

October 26, 2025 by hgpu

* * *

high performance computing on graphics processing units: hgpu.org

AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis

DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

ProofWright: Towards Agentic Formal Verification of CUDA

Inside VOLT: Designing an Open-Source GPU Compiler

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)