high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Physics » Astrophysics » Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR

Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR

Hsi-Yu Schive, Ui-Han Zhang, Tzihong Chiueh

Department of Physics, National Taiwan University, 10617, Taipei, Taiwan

arXiv:1103.3373v1 [astro-ph.IM] (17 Mar 2011)

@article{2011arXiv1103.3373S,

author={Schive}, {H.-Y.} and {Zhang}, {U.-H.} and {Chiueh}, T.},

title={“{Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR}”},

journal={ArXiv e-prints},

archivePrefix={“arXiv”},

eprint={1103.3373},

primaryClass={“astro-ph.IM”},

keywords={Astrophysics – Instrumentation and Methods for Astrophysics},

year={2011},

month={mar},

adsurl={http://adsabs.harvard.edu/abs/2011arXiv1103.3373S},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1350

views

We present the implementation and performance of a class of directionally unsplit Riemann-solver-based hydrodynamic schemes on Graphic Processing Units (GPU). These schemes, including the MUSCL-Hancock method, a variant of the MUSCL-Hancock method, and the corner-transport-upwind method, are embedded into the adaptive-mesh-refinement (AMR) code GAMER. Furthermore, a hybrid MPI/OpenMP model is investigated, which enables the full exploitation of the computing power in a heterogeneous CPU/GPU cluster and significantly improves the overall performance. Performance benchmarks are conducted on the Dirac GPU cluster at NERSC/LBNL using up to 32 Tesla C2050 GPUs. A single GPU achieves speed-ups of 101(25) and 84(22) for uniform-mesh and AMR simulations, respectively, as compared with the performance using one(four) CPU core(s), and the excellent performance persists in multi-GPU tests. In addition, we make a direct comparison between GAMER and the widely-adopted CPU code Athena (Stone et al. 2008) in adiabatic hydrodynamic tests and demonstrate that, with the same accuracy, GAMER is able to achieve two orders of magnitude performance speed-up.

Tags: Astrophysics, CUDA, GPU cluster, Instrumentation and Methods for Astrophysics, MPI, nVidia, OpenMP, Tesla C2050

March 18, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR

Share this:

Recent source codes

Most viewed papers (last 30 days)