high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compiler-assisted distribution of OpenMP code for improved scalability

Compiler-assisted distribution of OpenMP code for improved scalability

Jannek Squar

Fakultät für Mathematik, Informatik und Naturwissenschaften der Universität Hamburg

Universität Hamburg, 2023

BibTeX

Download (PDF)

View

Source

Source codes

Package:

CATO: automatic source transformation to apply HPC frameworks with minimal user interaction

947

views

High performance computing is a complex field, with many homogeneous and heterogeneous hardware architectures, and numerous programming paradigms, libraries and compilers. OpenMP and netCDF are relatively widely used in Earth system research because they are comparatively easy to learn and yet can exploit the potential of a single compute node. However, Earth system scientists without the appropriate training may find it difficult to run their application on a distributed HPC infrastructure. As Earth system applications generally benefit from being able to run on large input problems, they would particularly benefit from HPC features such as process parallelisation, data reduction or parallel input and output. However, their use is not trivial and requires a lot of experience and work. In order to support them, this dissertation develops a tool that allows them to quickly apply useful HPC frameworks without having to deal with the implementation first, by automatically incorporating the necessary code changes into their application. Different approaches are considered that can be used to automatically traverse, analyse and transform code. Based on this, the design of a new tool is presented: CATO is based on the LLVM framework and uses its rich API for automatic code analysis and transformation to add new features to an application. CATO analyses the existing OpenMP kernels of an application and transforms them into equivalent MPI code so that they can be executed on distributed memory systems. If the application also uses netCDF, it can be automatically adapted to use the data compression and parallel input/output features of the netCDF library. In this way, the user can test the effect of the HPC concepts mentioned without having to adapt his application. The evaluation of CATO is based on a PDE solver as well as on netCDF microbenchmarks to examine the functionality and performance of the modified applications. The tests showed that there was no runtime performance benefit due to the additional overhead caused by CATO. However, it can now use the aggregated memory of multiple nodes and the memory consumption per process is optimised. In addition, the memory footprint as well as the runtime of the I/O phase of the modified application can be significantly improved by using parallel I/O. Through the automatic integration of netCDF compression algorithms, the user can also decide at runtime to compress his output, which can also significantly reduce the memory consumption in the file system.

Tags: Compression, Computer science, Heterogeneous systems, LLVM, MPI, OpenMP, Package, Thesis

September 24, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Compiler-assisted distribution of OpenMP code for improved scalability

Package:

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Compiler-assisted distribution of OpenMP code for improved scalability

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)