high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

Benoit Lange, Pierre Fortin

Sorbonne Universites, UPMC Univ Paris 06, Institut du Calcul et de la Simulation, 75005 Paris, France

hal-00947130, (14 February 2014)

@techreport{lange:hal-00947130,

hal_id={hal-00947130},

url={http://hal.upmc.fr/hal-00947130},

title={Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations},

author={Lange, Benoit and Fortin, Pierre},

language={Anglais},

affiliation={Laboratoire d’Informatique de Paris 6 – LIP6 , Institut du Calcul et de la Simulation – ICS},

pdf={http://hal.upmc.fr/hal-00947130/PDF/article.pdf}

}

Download (PDF)

View

Source

1856

views

In astrophysical N-body simulations, Dehnen’s algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multi-core CPUs and manycore architectures (Xeon Phi). We focus here on both performance and portability, while preserving Dehnen’s original algorithm. We first use task parallelism, with either OpenMP or Intel TBB, for the dual tree traversal. We then rely on the SPMD (single-program, multiple-data) model for the SIMD vectorization of the near field part thanks to the Intel SPMD Program Compiler. We compare the pfalcON performance to related work, and finally obtain performance results that match one of the best current tree-code implementations on GPU.

Tags: Algorithms, Astrophysics, CUDA, N-body simulation, nVidia, Tesla C2070, Tesla K20

February 27, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations

Share this:

Recent source codes

Most viewed papers (last 30 days)