{"id":8675,"date":"2012-12-18T23:43:28","date_gmt":"2012-12-18T21:43:28","guid":{"rendered":"http:\/\/hgpu.org\/?p=8675"},"modified":"2012-12-18T23:43:28","modified_gmt":"2012-12-18T21:43:28","slug":"parallelisation-of-shallow-water-simulation-for-heterogeneous-architectures","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=8675","title":{"rendered":"Parallelisation of Shallow Water Simulation for Heterogeneous Architectures"},"content":{"rendered":"<p>This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. Each iteration involves the identification, analysis and improvement of a particular overhead. The process repeats until no further optimisations can be performed. The evaluation is based on the comparison of the two implementations.  The  final  results  show  strengths  and  weaknesses  in  both implementations, as they are reflected from the differences of the two architectures.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[89,104,90,3],"tags":[14,1795,20,1231,1793,390],"class_list":["post-8675","post","type-post","status-publish","format-standard","hentry","category-nvidia-cuda","category-fluid-dynamics","category-opencl","category-paper","tag-cuda","tag-fluid-dynamics","tag-nvidia","tag-nvidia-quadro-fx-2000","tag-opencl","tag-thesis"],"views":2400,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8675","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8675"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/8675\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8675"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8675"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8675"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}