SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction

hgpu.org » Applications » Computer science » SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction

SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction

Kaixuan Zhang, Yunfan Cui, Shuhao Zhang, Chutong Ding, Shiyou Qian, Luping Wang, Jian Cao, Guangtao Xue, Cheng Huang, Guodong Yang, Liping Zhang

Shanghai Jiao Tong University, Shanghai, China

arXiv:2601.14910 [cs.PF], (21 Jan 2026)

DOI:10.48550/arXiv.2601.14910

@misc{zhang2026synperf,

title={SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction},

author={Kaixuan Zhang and Yunfan Cui and Shuhao Zhang and Chutong Ding and Shiyou Qian and Luping Wang and Jian Cao and Guangtao Xue and Cheng Huang and Guodong Yang and Liping Zhang},

year={2026},

eprint={2601.14910},

archivePrefix={arXiv},

primaryClass={cs.PF},

url={https://arxiv.org/abs/2601.14910}

}

Download (PDF)

View

Source

459

views

The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present SyncPerf, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel’s demands on the GPU’s heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Our evaluation across 11 GPU types from four generations of major architectures on two widely-used serving systems demonstrates that SyncPerf delivers high fidelity and strong generalizability. It achieves accurate predictions, with only 6.1% average error at the kernel level and 8.5% for end-to-end inference – reducing the error of state-of-the-art methods by 6.7x and 4.4x, respectively. We also demonstrate SynPerf’s value “beyond simulation” by utilizing its performance ceiling to diagnose implementation shortcomings and guide the optimization of a production fused MoE Triton kernel, achieving up to 1.7x speedup.

Tags: Computer science, CUDA, Heterogeneous systems, Machine learning, nVidia, nVidia A100, nVidia A40, nVidia H100, nVidia H20, nVidia H200, nVidia H800, nVidia L20, nVidia L40, nVidia RTX 6000 Ada, Performance, Triton

January 25, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org