Design of MILC Lattice QCD Application for GPU Clusters
National Center for Supercomputing Applications (NCSA), University of Illinois, Urbana, IL, USA
IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2011
@inproceedings{shi2011design,
title={Design of MILC lattice QCD application for GPU clusters},
author={Shi, G. and Gottlieb, S. and Torok, A. and Kindratenko, V.},
booktitle={Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International},
pages={363–371},
year={2011},
organization={IEEE}
}
We present an implementation of the improved staggered quark action lattice QCD computation designed for execution on a GPU cluster. The parallelization strategy is based on dividing the space-time lattice along the time dimension and distributing the sub-lattices among the GPU cluster nodes. We provide a mixed-precision floating-point GPU implementation of the multi-mass conjugate gradient solver. Our single GPU implementation of the conjugate gradient solver achieves a 9x performance improvement over the highly optimized code executed on a state-of-the-art eight-core CPU node. The overall application executes almost six times faster on a GPU-enabled cluster vs. a conventional multi-core cluster. The developed code is currently used for running production QCD calculations with electromagnetic corrections.
November 15, 2011 by hgpu