18926

Raising the Performance of the Tinker-HP Molecular Modeling Package on Intel’s HPC Architectures: a Living Review [Article v1.0]

Luc-Henri Jolly, Alejandro Duran, Louis Lagardere, Jay W. Ponder, Pengyu Ren, Jean-Philip Piquemal
Institut Parisien de Chimie Physique et Theorique, Sorbonne Universite, FR 2622 CNRS, 75005, Paris France
arXiv:1906.01211 [cs.MS], (4 Jun 2019)

@misc{jolly2019raising,

   title={Raising the Performance of the Tinker-HP Molecular Modeling Package on Intel’s HPC Architectures: a Living Review [Article v1.0]},

   author={Jolly, Luc-Henri and Duran, Alejandro and Lagardere, Ponder, Louis Jay W. and Ren, Pengyu and Piquemal, Jean-Philip},

   year={2019},

   eprint={1906.01211},

   archivePrefix={arXiv},

   primaryClass={cs.MS}

}

This living paper reviews the present High Performance Computing (HPC) capabilities of the Tinker-HP molecular modeling package. We focus here on the reference, double precision, massively parallel molecular dynamics engine present in Tinker-HP and dedicated to perform large scale simulations. We show how it can be adapted to recent Intel Central Processing Unit (CPU) petascale architectures. First, we discuss the new set of Intel Advanced Vector Extensions 512 (Intel AVX-512) instructions present in recent Intel processors (e.g., the Intel Xeon Scalable and Intel Xeon Phi 2nd generation processors) allowing for larger vectorization enhancements. These instructions constitute the central source of potential computational gains when using the latest processors, justifying important vectorization efforts for developers. We then briefly review the organization of the Tinker-HP code and identify the computational hotspots which require Intel AVX-512 optimization and we propose a general and optimal strategy to vectorize those particular parts of the code. We intended to present our optimization strategy in a pedagogical way so it could benefit to other researchers and students interested in gaining performances in their own software. Finally we present the performance enhancements obtained compared to the unoptimized code both sequentially and at the scaling limit in parallel for classical non-polarizable (CHARMM) and polarizable force fields (AMOEBA). This paper never ceases to be updated as we accumulate new data on the associated Github repository between new versions of this living paper.
Rating: 5.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: