Scaling Lattice QCD beyond 100 GPUs
Center for Computational Science, Boston University, Boston, MA 02215, USA
arXiv:1109.2935v1 [hep-lat] (13 Sep 2011)
@article{2011arXiv1109.2935B,
title={Scaling Lattice QCD beyond 100 GPUs},
author={Babich, R. and Clark, M. A. and Joo, B. and Shi, G. and Brower, R. C. and Gottlieb, S.},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1109.2935},
primaryClass={"hep-lat"},
keywords={High Energy Physics – Lattice, Computational Physics},
year={2011},
month={sep}
}
Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the initial Monte Carlo "gauge field generation" phase requires capability-level supercomputing, corresponding to O(100) GPUs or more. Such strong scaling has not been previously achieved. In this contribution, we demonstrate that using a multi-dimensional parallelization strategy and a domain-decomposed preconditioner allows us to scale into this regime. We present results for two popular discretizations of the Dirac operator, Wilson-clover and improved staggered, employing up to 256 GPUs on the Edge cluster at Lawrence Livermore National Laboratory.
September 15, 2011 by hgpu