Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass
Johns Hopkins University
arXiv:1711.00975 [astro-ph.IM], (2 Nov 2017)
@article{ivkin2017scalable,
title={Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass},
author={Ivkin, Nikita and Liu, Zaoxing and Yang, Lin F. and Kumar, Srinivas Suresh and Lemson, Gerard and Neyrinck, Mark and Szalay, Alexander S. and Braverman, Vladimir and Budavari, Tamas},
year={2017},
month={nov},
archivePrefix={"arXiv"},
primaryClass={astro-ph.IM}
}
Cosmological N-body simulations play a vital role in studying how the Universe evolves. To compare to observations and make scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to $10^9$ particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve the performance and scalability. Our rigorous analysis on the sketch parameters improves the previous results from finding the $10^3$ largest halos to $10^6$, and reveals the trade-offs between memory, running time and number of halos, k. Our experiments show that our tool can scale to datasets with up to $10^{12}$ particles, while using less than an hour of running time on a single Nvidia GTX GPU.
November 7, 2017 by hgpu