On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

hgpu.org » Applications » Computer science » On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning,Jacob Moore,Erin Heilman, Calvin Roth, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Marko Knezevic, Gavin Whetstone, Zachary Baker, Robert Robey

Engineering Technology & Design Division, Los Alamos National Laboratory, Los Alamos

Information, Volume 15, Issue 11, 2024

DOI:10.3390/info15110673

BibTeX

Download (PDF)

View

Source

Source codes

Package:

MATAR

886

views

This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures.

Tags: Computer science, CUDA, Fortran, HIP, nVidia, Package, performance portability, Pthreads

November 10, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org