high performance computing on graphics processing units: hgpu.org

Size Zheng, Xuegui Zheng, Hanshi Sun, Qi Hou, Wenlei Bao, Shiyu Li, Haojie Duanmu, Jin Fang, Chenli Xue, Chenhui Huang, Yuanqiang Liu, Renze Chen, Ningxin Zheng, Dongyang Wang, Li-Wen Chang, Liqiang Lu, Yun Liang, Jidong Zhai, Xin Liu

View

Download (PDF)

Source codes

Tags: Computer science, LLM, nVidia, nVidia H800, Package, Triton

May 11, 2026 by hgpu

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

Haohui Mai, Xiaoyan Guo, Xiangyun Ding, Daifeng Li, Qiuchu Yu, Chenzhun Guo, Cong Wang, Jiacheng Zhao, Christos Kozyrakis, Binhang Yuan

View

Download (PDF)

Tags: AMD, AMD Radeon Instinct MI300X, Computer science, DSL, LLM, Triton

May 3, 2026 by hgpu

A Human–Machine Collaborative Tuning Framework for Triton Kernel Optimization on SIMD Platforms

Xulin Zhou, Hongbin Zhang, Mingjie Xing

View

Download (PDF)

Tags: Auto-Tuning, Computer science, Evolutionary Computations, Triton

May 3, 2026 by hgpu

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Divakar Kumar Yadav, Tian Zhao, Deepak Kumar

View

Download (PDF)

Source codes

Tags: Computer science, CUBLAS, CUDA, LLM, nVidia, nVidia B200, nVidia H100, nVidia RTX PRO 6000, Package, Performance, Triton

May 3, 2026 by hgpu

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, Triton

April 13, 2026 by hgpu

* * *

high performance computing on graphics processing units: hgpu.org

Augmenting LLM Code Translation with Compiler Analysis for C to Triton Kernel Generation

Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

The Correctness Illusion in LLM-Generated GPU Kernels

daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC

KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

A Human–Machine Collaborative Tuning Framework for Triton Kernel Optimization on SIMD Platforms

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)