Fast Galactic Structure Finding using Graphics Processing Units
Computer Science Department, University of Cape Town
University of Cape Town, 2014
@article{wood2014master,
title={Fast Galactic Structure Finding using Graphics Processing Units},
author={Wood, Daniel},
year={2014}
}
Cosmological simulations are used by astronomers to investigate large scale structure formation and galaxy evolution. Structure finding, that is, the discovery of gravitationally-bound objects such as dark matter halos, is a crucial step in many such simulations. During recent years, advancing computational capacity has lead to halo-finders needing to manage increasingly larger simulations. As a result, many multi-core solutions have arisen in an attempt to process these simulations more efficiently. However, a many-core approach to the problem using graphics processing units (GPUs) appears largely unexplored. Since these simulations are inherently n-body problems, they contain a high degree of parallelism, which makes them very well suited to a GPU architecture. Therefore, it makes sense to determine the potential for further research in halo-finding algorithms on a GPU. We present a number of modified algorithms, for accelerating the identification of halos and sub-structures, using entry-level graphics hardware. The algorithms are based on an adaptive hierarchical refinement of the friends-of-friends (FoF) method using six phase-space dimensions: This allows for robust tracking of sub-structures. These methods are highly amenable to parallel implementation and run on GPUs. We implemented four separate systems; two on GPUs and two on CPUs. The first system for both CPU and GPU was implemented as a proof of concept exercise to familiarise us with the problem: These utilised minimum spanning trees (MSTs) and brute force methods. Our second implementation, for the CPU and GPU, capitalised on knowledge gained from the proof of concept applications, leading us to use kd-trees to efficiently solve the problem. The CPU implementations were intended to serve as benchmarks for our GPU applications. In order to verify the efficacy of the implemented systems, we applied our halo finders to cosmological simulations of varying size and compared the results obtained to those given by a widely used FoF commercial halo-finder. To conduct a fair comparison, CPU benchmarks were implemented using well-known libraries optimised for these calculations. The best performing implementation, with minimal optimisation, used kd-trees on the GPU. This achieved a 12x speed-up over our CPU implementation, which used similar methods. The same GPU implementation was compared with a current, widely-used commercial halo finder FoF system, and achieved a 2x speed-up for up to 5 million particles. Results suggest a scalable solution, where speed-up increases with the size of dataset used. We conclude that there is great potential for future research into an optimised kd-tree implementation on graphics hardware for the problem of structure finding in cosmological simulations.
July 1, 2014 by hgpu