Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives
Institut fur Physikalische Chemie, Johannes Gutenberg-Universitat Mainz, D-55128 Mainz, Germany
arXiv:1609.08094 [physics.chem-ph], (26 Sep 2016)
@article{eriksen2016efficient,
title={Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives},
author={Eriksen, Janus Juul},
year={2016},
month={sep},
archivePrefix={"arXiv"},
primaryClass={physics.chem-ph}
}
It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order M{o}ller-Plesset (MP2) model in its resolution-of-the-identity (RI) approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimized device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly optimized to such a degree that the final implementations (using either double and/or single precision arithmetics) are capable of scaling to as large systems as allowed for by the capacity of the host central processing unit (CPU) main memory. The performance of the hybrid CPU/GPU implementations is assessed through calculations on test systems of alanine amino acid chains using one-electron basis sets of increasing size (ranging from double- to pentuple-zeta quality). For all but the smallest problem sizes of the present study, the optimized accelerated codes (using a single multi-core CPU host node in conjunction with six GPUs) are found to be capable of reducing the total time-to-solution by at least an order of magnitude over optimized, OpenMP-threaded CPU-only reference implementations.
September 27, 2016 by hgpu