28626

Compiler-assisted distribution of OpenMP code for improved scalability

Jannek Squar
Fakultät für Mathematik, Informatik und Naturwissenschaften der Universität Hamburg
Universität Hamburg, 2023

@phdthesis{squar2023compiler,

   title={Compiler-assisted distribution of OpenMP code for improved scalability},

   author={Squar, Jannek},

   year={2023},

   school={Staats-und Universit{"a}tsbibliothek Hamburg Carl von Ossietzky}

}

High performance computing is a complex field, with many homogeneous and heterogeneous hardware architectures, and numerous programming paradigms, libraries and compilers. OpenMP and netCDF are relatively widely used in Earth system research because they are comparatively easy to learn and yet can exploit the potential of a single compute node. However, Earth system scientists without the appropriate training may find it difficult to run their application on a distributed HPC infrastructure. As Earth system applications generally benefit from being able to run on large input problems, they would particularly benefit from HPC features such as process parallelisation, data reduction or parallel input and output. However, their use is not trivial and requires a lot of experience and work. In order to support them, this dissertation develops a tool that allows them to quickly apply useful HPC frameworks without having to deal with the implementation first, by automatically incorporating the necessary code changes into their application. Different approaches are considered that can be used to automatically traverse, analyse and transform code. Based on this, the design of a new tool is presented: CATO is based on the LLVM framework and uses its rich API for automatic code analysis and transformation to add new features to an application. CATO analyses the existing OpenMP kernels of an application and transforms them into equivalent MPI code so that they can be executed on distributed memory systems. If the application also uses netCDF, it can be automatically adapted to use the data compression and parallel input/output features of the netCDF library. In this way, the user can test the effect of the HPC concepts mentioned without having to adapt his application. The evaluation of CATO is based on a PDE solver as well as on netCDF microbenchmarks to examine the functionality and performance of the modified applications. The tests showed that there was no runtime performance benefit due to the additional overhead caused by CATO. However, it can now use the aggregated memory of multiple nodes and the memory consumption per process is optimised. In addition, the memory footprint as well as the runtime of the I/O phase of the modified application can be significantly improved by using parallel I/O. Through the automatic integration of netCDF compression algorithms, the user can also decide at runtime to compress his output, which can also significantly reduce the memory consumption in the file system.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: