high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Haolei Bai, Lingcheng Kong, Xueyi Chen, Jianmian Wang, Zhiqiang Tao, Huan Wang

Westlake University

arXiv:2602.11715 [cs.LG], (12 Feb 2026)

DOI:10.48550/arXiv.2602.11715

@misc{bai2026dice,

title={DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels},

author={Haolei Bai and Lingcheng Kong and Xueyi Chen and Jianmian Wang and Zhiqiang Tao and Huan Wang},

year={2026},

eprint={2602.11715},

archivePrefix={arXiv},

primaryClass={cs.LG},

url={https://arxiv.org/abs/2602.11715}

}

Download (PDF)

View

Source

Source codes

Package:

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

818

views

Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, Package

February 16, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

high performance computing on graphics processing units: hgpu.org

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Package:

Your response

Recent source codes

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

Most viewed papers (last 30 days)

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)