SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction
Shanghai Jiao Tong University, Shanghai, China
arXiv:2601.14910 [cs.PF], (21 Jan 2026)
@misc{zhang2026synperf,
title={SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction},
author={Kaixuan Zhang and Yunfan Cui and Shuhao Zhang and Chutong Ding and Shiyou Qian and Luping Wang and Jian Cao and Guangtao Xue and Cheng Huang and Guodong Yang and Liping Zhang},
year={2026},
eprint={2601.14910},
archivePrefix={arXiv},
primaryClass={cs.PF},
url={https://arxiv.org/abs/2601.14910}
}
The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present SyncPerf, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel’s demands on the GPU’s heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Our evaluation across 11 GPU types from four generations of major architectures on two widely-used serving systems demonstrates that SyncPerf delivers high fidelity and strong generalizability. It achieves accurate predictions, with only 6.1% average error at the kernel level and 8.5% for end-to-end inference – reducing the error of state-of-the-art methods by 6.7x and 4.4x, respectively. We also demonstrate SynPerf’s value “beyond simulation” by utilizing its performance ceiling to diagnose implementation shortcomings and guide the optimization of a production fused MoE Triton kernel, achieving up to 1.7x speedup.
January 25, 2026 by hgpu
Your response
You must be logged in to post a comment.




