Performance Portability Challenges for Fortran Applications

hgpu.org » Applications » Computer science » Performance Portability Challenges for Fortran Applications

Performance Portability Challenges for Fortran Applications

Abigail Hsu, David Neill Asanza, Joseph A. Schoonover, Zach Jibben, Neil N. Carlson, Robert Robey

Stonybrook University

IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), 2018

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Truchas: 3D Multiphysics Simulation of Metal Casting and Processing

1973

views

This project investigates how different approaches to parallel optimization impact the performance portability for Fortran codes. In addition, we explore the productivity challenges due to the software tool-chain limitations unique to Fortran. For this study, we build upon the Truchas software, a metal casting manufacturing simulation code based on unstructured mesh methods and our initial efforts for accelerating two key routines, the gradient and mimetic finite difference calculations. The acceleration methods include OpenMP, for CPU multi-threading and GPU offloading, and CUDA for GPU offloading. Through this study, we find that the best optimization approach is dependent on the priorities of performance versus effort and the architectures that are targeted. CUDA is the most attractive where performance is the main priority, whereas the OpenMP on CPU and GPU approaches are preferable when emphasizing productivity. Furthermore, OpenMP for the CPU is the most portable across architectures. OpenMP for CPU multi-threading yields 3%-5% of achievable performance, whereas the GPU offloading generally results in roughly 74%-90% of achievable performance. However, GPU offloading with OpenMP 4.5 results in roughly 5% peak performance for the mimetic finite difference algorithm, suggesting further serial code optimization to tune this kernel. In general, these results imply low performance portability, below 10% as estimated by the Pennycook metric. Though these specific results are particular to this application, we argue that this is typical of many current scientific HPC applications and highlights the hurdles we will need to overcome on the path to exascale.

Tags: Computer science, CUDA, Finite difference, Fortran, nVidia, OpenMP, Package, performance portability, Tesla P100, Tesla V100

December 2, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org