{"id":28878,"date":"2023-12-18T11:10:18","date_gmt":"2023-12-18T09:10:18","guid":{"rendered":"https:\/\/hgpu.org\/?p=28878"},"modified":"2023-12-18T11:10:18","modified_gmt":"2023-12-18T09:10:18","slug":"application-performance-profiling-on-intel-gpus-with-oneprof-and-onetrace","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=28878","title":{"rendered":"Application Performance Profiling on Intel GPUs with Oneprof and Onetrace"},"content":{"rendered":"<p>Modern supercomputing applications are complex programs built on optimized frameworks and accelerated on GPUs. As such, dedicated tools for profiling GPU kernel utilization and performance are needed to support development of these applications, which in turn accelerates progress for the scientific computing and machine learning communities. This paper presents the Oneprof and Onetrace tools from the Intel PTI-GPU framework. These tools are capable of profiling applications and different levels of the runtime stack executing on Intel GPUs. To demonstrate the features and utility of these tools, we examine one HPC and one AI application.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Modern supercomputing applications are complex programs built on optimized frameworks and accelerated on GPUs. As such, dedicated tools for profiling GPU kernel utilization and performance are needed to support development of these applications, which in turn accelerates progress for the scientific computing and machine learning communities. This paper presents the Oneprof and Onetrace tools from [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,90,3],"tags":[1782,905,2133,1793,252,176,67],"class_list":["post-28878","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-computer-science","tag-intel","tag-intel-data-center-gpu-max-1550","tag-opencl","tag-openmp","tag-package","tag-performance"],"views":1714,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/28878","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=28878"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/28878\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=28878"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=28878"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=28878"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}