Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applicationsin Exascale Applications
University of Tennessee, Knoxville
University of Tennessee, 2024
@article{tan2024enhancing,
title={Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applications},
author={Tan, Nigel},
year={2024}
}
The growing diversity of hardware and software stacks adds additional development challenges to high-performance software as we move to exascale systems. Re- engineering software for each new platform is no longer practical due to increasing heterogeneity. Hardware designers are prioritizing AI/ML features like reduced precision that increase performance but sacrifice accuracy. The growing scale of simulations and the associated checkpointing frequency exacerbate the I/O overhead and storage cost challenges already present in petascale systems. Moving forward, the community must address performance portability, precision optimization, and data deduplication challenges to ensure that exascale applications efficiently deliver scientific discovery. In this dissertation, we address each challenge posed to exascale applications by emerging heterogeneous hardware. We enhance the performance portability of the Vector Particle-In-Cell (VPIC) application using the Kokkos portability framework, achieving performance gain across different platforms. We develop techniques for leveraging lower precision formats such as 16-bit floating-point and 16-bit fixed-point for the VPIC application while preserving scientific accuracy, demonstrating similar scientific findings as for 32-bit floating-point. We design a Merkle tree-based data deduplication method that prunes spatiotemporal redundancy and creates a compact metadata representation for incremental checkpointing, mitigating I/O overhead and reducing storage costs.
September 15, 2024 by hgpu