{"id":2825,"date":"2011-02-13T20:47:06","date_gmt":"2011-02-13T20:47:06","guid":{"rendered":"http:\/\/hgpu.org\/?p=2825"},"modified":"2011-04-08T12:44:22","modified_gmt":"2011-04-08T12:44:22","slug":"comparison-of-gpu-architectures-for-asynchronous-communication-with-finite-differencing-applications","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=2825","title":{"rendered":"Comparison of GPU Architectures for Asynchronous Communication with Finite-Differencing Applications"},"content":{"rendered":"<p>Graphical Processing Units (GPUs) are good data-parallel performance accelerators for solving regular mesh partial differential equations (PDEs) whereby low-latency communications and high compute to communications ratios can yield very high levels of computational efficiency. Finite-difference time-domain methods still play an important role for many PDE applications. Iterative multi-grid and multilevel algorithms can converge faster than ordinary finite difference methods but can be much more difficult to parallelise with GPU memory constraints. We report on some practical algorithmic and data layout approaches and on performance data on a range of GPUs with CUDA. We focus on the use of multiple GPU devices with a single CPU host and the asynchronous CPU\/GPU communications issues involved. We obtain more than two orders of magnitude of speedup over a comparable CPU core.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Graphical Processing Units (GPUs) are good data-parallel performance accelerators for solving regular mesh partial differential equations (PDEs) whereby low-latency communications and high compute to communications ratios can yield very high levels of computational efficiency. Finite-difference time-domain methods still play an important role for many PDE applications. Iterative multi-grid and multilevel algorithms can converge faster than [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,89,3],"tags":[1782,14,327,20,253,379,550,551],"class_list":["post-2825","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-nvidia-cuda","category-paper","tag-computer-science","tag-cuda","tag-finite-difference","tag-nvidia","tag-nvidia-geforce-gtx-260","tag-nvidia-geforce-gtx-480","tag-partial-differential-equations","tag-pdes"],"views":1881,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/2825","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2825"}],"version-history":[{"count":1,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/2825\/revisions"}],"predecessor-version":[{"id":3512,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/2825\/revisions\/3512"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2825"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2825"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2825"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}