{"id":2811,"date":"2011-02-11T12:42:13","date_gmt":"2011-02-11T12:42:13","guid":{"rendered":"http:\/\/hgpu.org\/?p=2811"},"modified":"2011-02-11T12:42:13","modified_gmt":"2011-02-11T12:42:13","slug":"the-fast-multipole-method-on-parallel-clusters-multicore-processors-and-graphics-processing-units","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=2811","title":{"rendered":"The fast multipole method on parallel clusters, multicore processors, and graphics processing units"},"content":{"rendered":"<p>In this article, we discuss how the fast multipole method (FMM) can be implemented on modern parallel computers, ranging from computer clusters to multicore processors and graphics cards (GPU). The FMM is a somewhat difficult application for parallel computing because of its tree structure and the fact that it requires many complex operations which are not regularly structured. Computational linear algebra with dense matrices for example allows many optimizations that leverage the regular computation pattern. FMM can be similarly optimized but we will see that the complexity of the optimization steps is greater. The discussion will start with a general presentation of FMMs. We briefly discuss parallel methods for the FMM, such as building the FMM tree in parallel, and reducing communication during the FMM procedure. Finally, we will focus on porting and optimizing the FMM on GPUs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we discuss how the fast multipole method (FMM) can be implemented on modern parallel computers, ranging from computer clusters to multicore processors and graphics cards (GPU). The FMM is a somewhat difficult application for parallel computing because of its tree structure and the fact that it requires many complex operations which are [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,89,104,3],"tags":[1787,14,723,1795,20,199,244],"class_list":["post-2811","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-nvidia-cuda","category-fluid-dynamics","category-paper","tag-algorithms","tag-cuda","tag-fast-multipole-method","tag-fluid-dynamics","tag-nvidia","tag-tesla-c1060","tag-tesla-s1070"],"views":1882,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/2811","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2811"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/2811\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}