Performance analysis of parallel gravitational N-body codes on large GPU cluster
National Astronomical Observatories and Key Laboratory of Computational Astrophysics, Chinese Academy of Sciences, Beijing 100012, China
arXiv:1508.02510 [astro-ph.IM], (11 Aug 2015)
@article{huang2015performance,
title={Performance analysis of parallel gravitational N-body codes on large GPU cluster},
author={Huang, Siyi and Spurzem, Rainer and Berczik, Peter},
year={2015},
month={aug},
archivePrefix={"arXiv"},
primaryClass={astro-ph.IM}
}
We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales – NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200-300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of two codes, finding that in the same cases Bonsai adopts larger time steps as well as relative energy errors than NBODY6++, typically ranging from 10-50 times larger, depending on the chosen parameters of the codes. While the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scale, and thus allowing the user to choose from either one with finetuned parameters accordingly.
August 12, 2015 by hgpu