{"id":7373,"date":"2012-03-31T00:40:59","date_gmt":"2012-03-30T21:40:59","guid":{"rendered":"http:\/\/hgpu.org\/?p=7373"},"modified":"2012-03-31T00:45:14","modified_gmt":"2012-03-30T21:45:14","slug":"adaptive-input-aware-compilation-for-graphics-engines","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=7373","title":{"rendered":"Adaptive Input-aware Compilation for Graphics Engines"},"content":{"rendered":"<p>While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations,the tedious process of performance tuning required to optimize applicationsis an obstacle to wider adoption of GPUs. In addition to the programmability challenges posed by GPU\u2019s complex memory hierarchy and parallelism model, a well-known application design problem is target portability across different GPUs. However, even for a single GPU target, changing a program\u2019s input characteristics can make an already-optimized implementation of a program perform poorly. In this work, we propose Adaptic, an adaptiveinput-aware compilation system to tackle this important, yet overlooked, input portability problem. Using this system, programmers develop their applications in a high-level streaming language and let Adaptic undertake the difficult task of input portable optimizations and code generation. Several input-aware optimizations are introduced to make efficient use of the memory hierarchy and customize thread composition. At runtime, a properly optimized version of the application is executed based on the actual program input. We perform a head-to-head comparison between the Adaptic generated and hand-optimized CUDA programs. The results show that Adaptic is capable of generating codes that can perform on par with their hand-optimized counterparts over certain input ranges and outperform them when the input falls out of the hand-optimized programs\u2019 \u201ccomfort zone\u201d. Furthermore, we show that input-aware results are sustainable across different GPU targets making it possible to write and optimize applications once and run them anywhere.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations,the tedious process of performance tuning required to optimize applicationsis an obstacle to wider adoption of GPUs. In addition to the programmability challenges posed by GPU\u2019s complex memory hierarchy and parallelism model, a well-known application design problem is target portability across [&hellip;]<\/p>\n","protected":false},"author":62,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,3],"tags":[1299,14,1288,20,251,298,1300,1298,378],"class_list":["post-7373","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","tag-compiler","tag-cuda","tag-gpu","tag-nvidia","tag-nvidia-geforce-gtx-285","tag-optimization","tag-portability","tag-streaming","tag-tesla-c2050"],"views":2557,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/62"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7373"}],"version-history":[{"count":3,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7373\/revisions"}],"predecessor-version":[{"id":7376,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/7373\/revisions\/7376"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}