{"id":27331,"date":"2022-10-09T14:47:30","date_gmt":"2022-10-09T11:47:30","guid":{"rendered":"https:\/\/hgpu.org\/?p=27331"},"modified":"2022-10-09T14:47:30","modified_gmt":"2022-10-09T11:47:30","slug":"benchmarking-optimization-algorithms-for-auto-tuning-gpu-kernels","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=27331","title":{"rendered":"Benchmarking optimization algorithms for auto-tuning GPU kernels"},"content":{"rendered":"<p>Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly-efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires re-tuning after code changes, for different input data, and for different architectures. However, the discrete, and non-convex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is varied. We conduct a survey by performing experiments on 26 different kernel spaces, from 9 different GPUs, for 16 different evolutionary black-box optimization algorithms. We then analyze these results and introduce a novel metric based on the PageRank centrality concept as a tool for gaining insight into the difficulty of the optimization problem. We demonstrate that our metric correlates strongly with observed tuning performance.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,89,90,3],"tags":[2060,7,1856,451,1782,14,20,2066,1957,1767,2023,1793,67,1927,980,1390,1931,1963],"class_list":["post-27331","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-opencl","category-paper","tag-amd-radeon-instinct-mi50","tag-ati","tag-auto-tuning","tag-benchmarking","tag-computer-science","tag-cuda","tag-nvidia","tag-nvidia-a100","tag-nvidia-geforce-gtx-1080-ti","tag-nvidia-geforce-gtx-titan-x","tag-nvidia-titan-rtx","tag-opencl","tag-performance","tag-pycuda","tag-pyopencl","tag-tesla-k20","tag-tesla-p100","tag-tesla-v100"],"views":1705,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/27331","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=27331"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/27331\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=27331"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=27331"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=27331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}