{"id":19149,"date":"2019-10-06T10:02:17","date_gmt":"2019-10-06T07:02:17","guid":{"rendered":"https:\/\/hgpu.org\/?p=19149"},"modified":"2019-10-06T10:02:17","modified_gmt":"2019-10-06T07:02:17","slug":"syntix-a-profiling-based-resource-estimator-for-cuda-kernels","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=19149","title":{"rendered":"Syntix: A Profiling Based Resource Estimator for CUDA Kernels"},"content":{"rendered":"<p>Trending applications such as AI and data analytics have mandated the use of GPUs in modern datacenters for performance reasons. Current practice dictates to dedicate GPUs to applications, which limits the amount of concurrent users to the available GPUs. That use of GPUs contradicts with the policy of datacenters to oversubscribe resources and accommodate as many user applications as possible. To address this issue, providers will inevitably resort to GPU sharing. In this work we introduce Syntix, a mechanism that we deploy on GPU sharing system and 1) profiles CUDA kernels in order to learn their resource requirements in terms of threads and blocks and 2) assigns those resources to kernels in order to be efficiently collocated into streams. Syntix is able to exploit the resources that are possibly wasted from the execution of an individual kernel and save the 80% of them on average.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Trending applications such as AI and data analytics have mandated the use of GPUs in modern datacenters for performance reasons. Current practice dictates to dedicate GPUs to applications, which limits the amount of concurrent users to the available GPUs. That use of GPUs contradicts with the policy of datacenters to oversubscribe resources and accommodate as [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,89,3],"tags":[1782,14,106,749,20,1939],"class_list":["post-19149","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-gpu-cluster","tag-grid","tag-nvidia","tag-nvidia-quadro-k-2200"],"views":2049,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/19149","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=19149"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/19149\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=19149"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=19149"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=19149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}