{"id":19428,"date":"2020-01-19T14:47:40","date_gmt":"2020-01-19T12:47:40","guid":{"rendered":"https:\/\/hgpu.org\/?p=19428"},"modified":"2020-01-19T14:47:40","modified_gmt":"2020-01-19T12:47:40","slug":"md_poly-a-performance-portable-polyhedral-compiler-based-on-multi-dimensional-homomorphisms","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=19428","title":{"rendered":"md_poly: A Performance-Portable Polyhedral Compiler Based on Multi-Dimensional Homomorphisms"},"content":{"rendered":"<p>Polyhedral compilers automatically parallelize sequential programs for multi- and many-core architectures, such as CPU and GPU. However, parallel code generated by state-ofthe-art polyhedral compilers often lacks performance portability, because the existing compilers are usually optimized toward only a single particular architecture (e.g., GPU). Moreover, even on their target architecture, polyhedral compilers sometimes fail to reach high performance, because they often miss important optimizations, e.g., efficiently exploiting fast memory resources. We present our work-in-progress results for md_poly \u2013 a novel polyhedral compiler that generates portable highperformance code from sequential C programs with perfect loop nests and rectangular iteration domains. In contrast to the existing polyhedral compilers, md_poly relies on the code generation approach for Multi-Dimensional Homomorphisms (MDHs): we show that the internal program representation of polyhedral compilers (a.k.a. polyhedral model) can be automatically transformed into an equivalent MDH representation; this representation is suitable for generating highperformance program code that is performance portable over different architectures. Our preliminary experimental comparison against PPCG with two benchmarks \u2013 Gaussian Convolution and Matrix Multiplication \u2013 shows encouraging results: speedups up to 7x on Intel CPU and 3x on NVIDIA GPU on real-world input sizes from deep learning.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Polyhedral compilers automatically parallelize sequential programs for multi- and many-core architectures, such as CPU and GPU. However, parallel code generated by state-ofthe-art polyhedral compilers often lacks performance portability, because the existing compilers are usually optimized toward only a single particular architecture (e.g., GPU). Moreover, even on their target architecture, polyhedral compilers sometimes fail to reach [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[11,90,3],"tags":[215,955,1782,20,1793,176,1586,1963],"class_list":["post-19428","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-opencl","category-paper","tag-code-generation","tag-compilers","tag-computer-science","tag-nvidia","tag-opencl","tag-package","tag-performance-portability","tag-tesla-v100"],"views":2708,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/19428","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=19428"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/19428\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=19428"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=19428"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=19428"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}