Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience
National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6354, USA
arXiv:1812.07977 [physics.comp-ph], 19 Dec 2018
@article{budiardja2018targeting,
title={Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience},
author={Zare, Behrooz and Jafarinejad, Foad and Hashemi, Matin and Salehkaleybar, Saber},
year={2018},
month={dec},
archivePrefix={"arXiv"},
primaryClass={physics.comp-ph}
}
We use OpenMP directives to target hardware accelerators (GPUs) on Summit, a newly deployed supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), demonstrating simplified access to GPU devices for users of our astrophysics code GenASiS and useful speedup on a sample fluid dynamics problem. At a lower level, we use the capabilities of Fortran 2003 for C interoperability to provide wrappers to the OpenMP device memory runtime library routines (currently available only in C). At a higher level, we use C interoperability and Fortran 2003 type-bound procedures to modify our workhorse class for data storage to include members and methods that significantly streamline the persistent allocation of and on-demand association to GPU memory. Where the rubber meets the road, users offload computational kernels with OpenMP target directives that are rather similar to constructs already familiar from multi-core parallelization. In this initial example we demonstrate total wall time speedups of ~4X in ‘proportional resource tests’ that compare runs with a given percentage of nodes’ GPUs with runs utilizing instead the same percentage of nodes’ CPU cores, and reasonable weak scaling up to 8000 GPUs vs. 56,000 CPU cores (1333 1/3 Summit nodes). These speedups increase to over 12X when pinned memory is used strategically. We make available the source code from this work.
December 23, 2018 by hgpu