30762

A Human–Machine Collaborative Tuning Framework for Triton Kernel Optimization on SIMD Platforms

Xulin Zhou, Hongbin Zhang, Mingjie Xing
Institute of Software, Chinese Academy of Sciences, Beijing, China

@article{zhou2026human,

   title={A Human–Machine Collaborative Tuning Framework for Triton Kernel Optimization on SIMD Platforms},

   author={Zhou, Xulin and Zhang, Hongbin and Xing, Mingjie},

   years={2026}

}

Download Download (PDF)   View View   Source Source   

316

views

Single Instruction, Multiple Data (SIMD) technology enhances performance through parallel data processing on CPUs. SIMD platforms are widely adopted across domains ranging from high-performance computing to AI inference. As modern AI workloads increasingly rely on Python-based kernel frameworks to maintain usability and benefit from automatic tuning, Triton has emerged as a representative solution. However, Triton’s autotuning mechanism, designed primarily for NVIDIA GPUs, fails to effectively exploit the architectural features of SIMD CPUs, creating a significant performance gap on these platforms. To address this problem, we introduce a human–machine collaborative design tailored for Triton kernel tuning on SIMD platforms. This design improves both development efficiency and performance by capturing high-level SIMD optimization intent from human users and integrating it seamlessly into machine framework tuning. Based on this collaborative design, we develop a tuning framework composed of a front-end for user intent recognition and a back-end for user-guided, SIMD-aware tuning. Experiments on x86 and RISC-V platforms show an average performance improvement of 31.7% over native Triton tuning, with tuning cost reduced by up to 75.0%.
No votes yet.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: