high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Portability of Fortran’s ‘do concurrent’ on GPUs

Portability of Fortran’s ‘do concurrent’ on GPUs

Ronald M. Caplan, Miko M. Stulajter, Jon A. Linker, Jeff Larkin, Henry A. Gabb, Shiquan Su, Ivan Rodriguez, Zachary Tschirhart, Nicholas Malaya

Predictive Science Inc., San Diego, CA USA

arXiv:2408.07843 [cs.PL], (14 Aug 2024)

DOI:10.48550/arXiv.2408.07843

BibTeX

Download (PDF)

View

Source

Source codes

Package:

HipFT: High-performance Flux Transport

1698

views

There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown.

Tags: Computer science, Fortran, Intel, Intel Data Center GPU Max 1550, Intel Ponte Vecchio Max 1100, nVidia, nVidia A100, nVidia GH200, nVidia H100, OpenACC, OpenMP, Package

August 18, 2024 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Portability of Fortran’s ‘do concurrent’ on GPUs

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Portability of Fortran’s ‘do concurrent’ on GPUs

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)