Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience
National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6354, USA
arXiv:1812.07977 [physics.comp-ph], 19 Dec 2018
@article{budiardja2018targeting,
title={Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience},
author={Zare, Behrooz and Jafarinejad, Foad and Hashemi, Matin and Salehkaleybar, Saber},
year={2018},
month={dec},
archivePrefix={"arXiv"},
primaryClass={physics.comp-ph}
}
We use OpenMP directives to target hardware accelerators (GPUs) on Summit, a newly deployed supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), demonstrating simplified access to GPU devices for users of our astrophysics code GenASiS and useful speedup on a sample fluid dynamics problem. At a lower level, we use the capabilities of Fortran 2003 for C interoperability to provide wrappers to the OpenMP device memory runtime library routines (currently available only in C). At a higher level, we use C interoperability and Fortran 2003 type-bound procedures to modify our workhorse class for data storage to include members and methods that significantly streamline the persistent allocation of and on-demand association to GPU memory. Where the rubber meets the road, users offload computational kernels with OpenMP target directives that are rather similar to constructs already familiar from multi-core parallelization. In this initial example we demonstrate total wall time speedups of ~4X in ‘proportional resource tests’ that compare runs with a given percentage of nodes’ GPUs with runs utilizing instead the same percentage of nodes’ CPU cores, and reasonable weak scaling up to 8000 GPUs vs. 56,000 CPU cores (1333 1/3 Summit nodes). These speedups increase to over 12X when pinned memory is used strategically. We make available the source code from this work.
December 23, 2018 by hgpu
Your response
You must be logged in to post a comment.