Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience

hgpu.org » Programming » CUDA » Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience

Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience

Reuben D. Budiardja, Christian Y. Cardall

National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6354, USA

arXiv:1812.07977 [physics.comp-ph], 19 Dec 2018

@article{budiardja2018targeting,

title={Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience},

author={Zare, Behrooz and Jafarinejad, Foad and Hashemi, Matin and Salehkaleybar, Saber},

year={2018},

month={dec},

archivePrefix={"arXiv"},

primaryClass={physics.comp-ph}

}

Download (PDF)

View

Source

Source codes

Package:

GenASiS_Basics: Object-oriented Utilitarian Functionality for Large-scale Physics Simulations

2883

views

We use OpenMP directives to target hardware accelerators (GPUs) on Summit, a newly deployed supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), demonstrating simplified access to GPU devices for users of our astrophysics code GenASiS and useful speedup on a sample fluid dynamics problem. At a lower level, we use the capabilities of Fortran 2003 for C interoperability to provide wrappers to the OpenMP device memory runtime library routines (currently available only in C). At a higher level, we use C interoperability and Fortran 2003 type-bound procedures to modify our workhorse class for data storage to include members and methods that significantly streamline the persistent allocation of and on-demand association to GPU memory. Where the rubber meets the road, users offload computational kernels with OpenMP target directives that are rather similar to constructs already familiar from multi-core parallelization. In this initial example we demonstrate total wall time speedups of ~4X in ‘proportional resource tests’ that compare runs with a given percentage of nodes’ GPUs with runs utilizing instead the same percentage of nodes’ CPU cores, and reasonable weak scaling up to 8000 GPUs vs. 56,000 CPU cores (1333 1/3 Summit nodes). These speedups increase to over 12X when pinned memory is used strategically. We make available the source code from this work.

Tags: cfd, CUDA, Fluid dynamics, Fortran, MPI, nVidia, OpenMP, Package, Physics, Tesla V100

December 23, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org