{"id":8093,"date":"2012-08-21T12:56:22","date_gmt":"2012-08-21T09:56:22","guid":{"rendered":"http:\/\/hgpu.org\/?p=8093"},"modified":"2012-08-21T12:56:22","modified_gmt":"2012-08-21T09:56:22","slug":"fixing-performance-bugs-an-empirical-study-of-open-source-gpgpu-programs","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=8093","title":{"rendered":"Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs"},"content":{"rendered":"<p>Given the extraordinary computational power of modern graphics processing units (GPUs), general purpose computation on GPUs (GPGPU) has become an increasingly important platform for high performance computing. To better understand how well the GPU resource has been utilized by application developers and then to facilitate them to develop high performance GPGPU code, we conduct an empirical study on GPGPU programs from ten open-source projects. These projects span a wide range of disciplines and many are designed as high performance libraries. Among these projects, we found various performance &#8216;bugs&#8217;, i.e., code segments leading to inefficient use of GPU hardware. We characterize these performance bugs, and propose the bug fixes. Our experiments confirm both significant performance gains and energy savings from our fixes and reveal interesting insights on different GPUs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Given the extraordinary computational power of modern graphics processing units (GPUs), general purpose computation on GPUs (GPGPU) has become an increasingly important platform for high performance computing. To better understand how well the GPU resource has been utilized by application developers and then to facilitate them to develop high performance GPGPU code, we conduct an [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,90,3],"tags":[7,455,955,1782,14,20,251,379,1793,67],"class_list":["post-8093","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-opencl","category-paper","tag-ati","tag-ati-radeon-hd-5870","tag-compilers","tag-computer-science","tag-cuda","tag-nvidia","tag-nvidia-geforce-gtx-285","tag-nvidia-geforce-gtx-480","tag-opencl","tag-performance"],"views":2552,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8093","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8093"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8093\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8093"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8093"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8093"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}