{"id":12303,"date":"2014-06-17T13:16:11","date_gmt":"2014-06-17T10:16:11","guid":{"rendered":"http:\/\/hgpu.org\/?p=12303"},"modified":"2014-06-17T13:19:11","modified_gmt":"2014-06-17T10:19:11","slug":"on-the-performance-portability-of-structured-grid-codes-on-many-core-computer-architectures","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=12303","title":{"rendered":"On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures"},"content":{"rendered":"<p>With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel\u2019s Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area \u2014 structured grid codes \u2014 and investigated techniques for ensuring performance portability across a diverse range of different, high-end many-core architectures. We chose three codes to investigate: a 3D lattice Boltzmann code (D3Q19 BGK), the CloverLeaf hydrodynamics mini application from Sandia\u2019s Mantevo benchmark suite, and ROTORSIM, a production-quality structured grid, multiblock, compressible finite-volume CFD code. We have developed OpenCL versions of these codes in order to provide cross-platform functional portability, and compared the performance of the OpenCL versions of these structured grid codes to optimized versions on each platform, including hybrid OpenMP\/MPI\/AVX versions on CPUs and Xeon Phi, and CUDA versions on NVIDIA GPUs. Our results show that, contrary to conventional wisdom, using OpenCL it is possible to achieve a high degree of performance portability, at least for structured grid applications, using a set of straightforward techniques. The performance portable code in OpenCL is also highly competitive with the best performance using the native parallel programming models on each platform.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel\u2019s Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area \u2014 structured grid codes \u2014 and investigated techniques for ensuring performance portability across [&hellip;]<\/p>\n","protected":false},"author":310,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[104,90,3],"tags":[1623,1795,1288,1483,1621,1793,1586,1624,1622],"class_list":["post-12303","post","type-post","status-publish","format-standard","hentry","category-fluid-dynamics","category-opencl","category-paper","tag-cloverleaf","tag-fluid-dynamics","tag-gpu","tag-intel-xeon-phi","tag-lattice-boltzmann","tag-opencl","tag-performance-portability","tag-rotorsim","tag-structured-grid"],"views":3186,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/12303","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/310"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12303"}],"version-history":[{"count":1,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/12303\/revisions"}],"predecessor-version":[{"id":12306,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/12303\/revisions\/12306"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12303"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12303"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}