{"id":3606,"date":"2011-04-15T20:42:34","date_gmt":"2011-04-15T20:42:34","guid":{"rendered":"http:\/\/hgpu.org\/?p=3606"},"modified":"2011-04-15T20:42:34","modified_gmt":"2011-04-15T20:42:34","slug":"n-body-simulation-for-astronomical-collisional-systems-with-a-new-simd-instruction-set-extension-to-the-x86-architecture-advanced-vector-extensions","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=3606","title":{"rendered":"N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions"},"content":{"rendered":"<p>We present a high-performance N-body code for astronomical collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we achieved the performance of ~ 20 giga floating point number operations per second (GFlops) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions (Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core. We have parallelized the collisional N-body code by using so-called NINJA scheme (Nitadori et al., 2006a), and achieved ~ 90 GFlops for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We can expect to achieve about 10 tera Flops (TFlops) for an astronomical collisional system with N ~ 10^5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems. This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We present a high-performance N-body code for astronomical collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8MB cache and 3.40 GHz) based on Sandy [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[36,96,3],"tags":[1787,1794,97,258,257],"class_list":["post-3606","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-astrophysics","category-paper","tag-algorithms","tag-astrophysics","tag-instrumentation-and-methods-for-astrophysics","tag-n-body-simulation","tag-stellar-dynamics"],"views":2197,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3606","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3606"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/3606\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3606"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3606"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3606"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}