{"id":30793,"date":"2026-05-11T00:42:15","date_gmt":"2026-05-10T21:42:15","guid":{"rendered":"https:\/\/hgpu.org\/?p=30793"},"modified":"2026-05-11T00:42:15","modified_gmt":"2026-05-10T21:42:15","slug":"kerncap-automated-kernel-extraction-and-isolation-for-amd-gpus","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=30793","title":{"rendered":"Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs"},"content":{"rendered":"<p>Iterative GPU kernel tuning is bottlenecked by the scale of the applications that host the kernels. Rapid iteration requires isolating the kernel so it can be edited, recompiled, and validated without rebuilding the full application &#8212; but manual isolation requires reconstructing build flags, dispatch configuration, and runtime inputs by hand, so developers usually settle for slow in-place edits. We present Kerncap, an automated kernel extraction tool that intercepts dispatches at the HSA runtime for both HIP and Triton, bridging Triton&#8217;s JIT-only metadata into HSA-level capture via a lightweight Python compile-hook shim. Kerncap performs an address-space closure of all device memory &#8212; a virtual-address-faithful snapshot that preserves embedded device pointers without DWARF metadata or pointer chasing &#8212; locates kernel sources, and emits self-contained reproducer projects. HIP reproducers use a Clang VFS overlay for source-level recompilation without modifying the original build system; Triton reproducers are tuning-pinned, binding the captured autotuner configuration into the artifact to preserve the JIT kernel&#8217;s numerical contract.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Iterative GPU kernel tuning is bottlenecked by the scale of the applications that host the kernels. Rapid iteration requires isolating the kernel so it can be edited, recompiled, and validated without rebuilding the full application &#8212; but manual isolation requires reconstructing build flags, dispatch configuration, and runtime inputs by hand, so developers usually settle for [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,3],"tags":[1438,2122,2159,2156,1782,2155,176,513,2167,2182],"class_list":["post-30793","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","tag-amd","tag-amd-radeon-instinct-mi210","tag-amd-radeon-instinct-mi300x","tag-amd-radeon-pro-w7900","tag-computer-science","tag-llm","tag-package","tag-python","tag-rocm","tag-triton"],"views":420,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=30793"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30793\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=30793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=30793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=30793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}