{"id":30509,"date":"2026-01-25T21:51:44","date_gmt":"2026-01-25T19:51:44","guid":{"rendered":"https:\/\/hgpu.org\/?p=30509"},"modified":"2026-01-25T21:59:13","modified_gmt":"2026-01-25T19:59:13","slug":"synperf-a-hybrid-analytical-ml-framework-for-gpu-performance-prediction","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=30509","title":{"rendered":"SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction"},"content":{"rendered":"<p>The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present SyncPerf, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel&#8217;s demands on the GPU&#8217;s heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Our evaluation across 11 GPU types from four generations of major architectures on two widely-used serving systems demonstrates that SyncPerf delivers high fidelity and strong generalizability. It achieves accurate predictions, with only 6.1% average error at the kernel level and 8.5% for end-to-end inference &#8211; reducing the error of state-of-the-art methods by 6.7x and 4.4x, respectively. We also demonstrate SynPerf&#8217;s value &#8220;beyond simulation&#8221; by utilizing its performance ceiling to diagnose implementation shortcomings and guide the optimization of a production fused MoE Triton kernel, achieving up to 1.7x speedup.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,452,1025,20,2066,2109,2132,2197,2177,2146,2188,2196,2130,67,2182],"class_list":["post-30509","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-heterogeneous-systems","tag-machine-learning","tag-nvidia","tag-nvidia-a100","tag-nvidia-a40","tag-nvidia-h100","tag-nvidia-h20","tag-nvidia-h200","tag-nvidia-h800","tag-nvidia-l20","tag-nvidia-l40","tag-nvidia-rtx-6000-ada","tag-performance","tag-triton"],"views":556,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30509","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=30509"}],"version-history":[{"count":1,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30509\/revisions"}],"predecessor-version":[{"id":30512,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30509\/revisions\/30512"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=30509"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=30509"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=30509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}