Scaling Soft Matter Physics to Thousands of GPUs in Parallel
EPCC, The University of Edinburgh
Advances in Engineering Software, 2013
@article{gray2013scaling,
title={Scaling Soft Matter Physics to Thousands of GPUs in Parallel},
author={Gray, Alan and Hart, Alistair and Henrich, Oliver and Stratford, Kevin},
year={2013}
}
We describe a multi-GPU implementation of the Ludwig application, which specialises in simulating of a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original CPU version with GPU functionality in a maintainable fashion. We present several optimisations that maximize performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each timestep. We detail our halo-exchange communication phase for the code which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5-5 times faster that the CPU version (on fully-utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs.
October 15, 2013 by hgpu