Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass
Johns Hopkins University
arXiv:1711.00975 [astro-ph.IM], (2 Nov 2017)
Cosmological N-body simulations play a vital role in studying how the Universe evolves. To compare to observations and make scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to $10^9$ particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve the performance and scalability. Our rigorous analysis on the sketch parameters improves the previous results from finding the $10^3$ largest halos to $10^6$, and reveals the trade-offs between memory, running time and number of halos, k. Our experiments show that our tool can scale to datasets with up to $10^{12}$ particles, while using less than an hour of running time on a single Nvidia GTX GPU.
November 7, 2017 by hgpu