General-purpose molecular dynamics simulations on GPU-based clusters
Institut fur Physik, University of Technology Ilmenau, 98684 Ilmenau, Germany
arXiv:1009.4330v2 [cond-mat.mtrl-sci] (6 Mar 2011)
@article{2010arXiv1009.4330T,
title={“{General-purpose molecular dynamics simulations on GPU-based clusters}”},
author={Trott}, C.~R. and {Winterfeld}, L. and {Crozier}, P.~S.},
journal={Arxiv preprint arXiv:1009.4330},
archivePrefix={“arXiv”},
eprint={1009.4330},
primaryClass={“cond-mat.mtrl-sci”},
keywords={Condensed Matter – Materials Science, Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Performance, Physics – Computational Physics},
year={2010},
month={sep},
adsurl={http://adsabs.harvard.edu/abs/2010arXiv1009.4330T},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all methods working with particle data, and standard LAMMPS C++ code for CPU execution. Cell and neighbor list approaches are compared for best performance on GPUs, with thread-per-atom and block-per-atom neighbor list variants showing best performance at low and high neighbor counts, respectively. Computational performance results of GPU-enabled LAMMPS are presented for a variety of materials classes (e.g. biomolecules, polymers, metals, semiconductors), along with a speed comparison versus other available GPU-enabled MD software. Finally, we show strong and weak scaling performance on a CPU/GPU cluster using up to 128 dual GPU nodes.
March 8, 2011 by hgpu