Automating Heterogeneous Parallelism in Numerical Differential Equations
Massachusetts Institute of Technology
Massachusetts Institute of Technology, 2024
@phdthesis{edelman2024automating,
title={Automating Heterogeneous Parallelism in Numerical Differential Equations},
author={Edelman, Alan and others},
year={2024},
school={Massachusetts Institute of Technology}
}
Scientific computing is an amalgamation of numerical methods and computer science. Developments in numerical analysis have allowed stable and accurate numerical schemes, whereas computer algorithms have been successfully adopted to standard multicore systems of today, enabling parallelism. Combining efficient numerical algorithms with efficient parallelism presents a challenge mainly due to the independent development of these fields and is, therefore, typically solved on a domain-specific basis by domain experts. The development of general-purpose tools that integrate parallelism into algorithms, accessible through highlevel languages, signifies the future direction for addressing computational demands across various domains. This thesis work represents a culmination of efforts in general-purpose parallel numerical algorithms for solving differential equations. We make them accessible by choosing the Julia programming language to implement the high-level framework. Solving differential equations appears to be an intrinsically serial process due to progressive time-stepping that proves challenging to parallelize. Most of the approaches are linked to two broad categories; The first is the parallelism of the solver operations by making each solve faster, and the latter is the parallelism between the solves, i.e., solving multiple batches at a time. We automate the parallelization process in both these domains while keeping the algorithms general-purpose. Parallelization with different hardware accelerators, such as CPUs and GPUs, is also investigated. Parallelism for sufficiently large stiff ODEs is traditionally linked to the parallelization of the matrix factorization stage. However, these methods still need to overcome the threading overhead for ODEs having less than approximately 200 states. We propose implementing adaptive-order, adaptive time-stepping stiff ODE solvers such as extrapolation methods, which can parallelize a single instance of an ODE solve even for small ODEs. The other need for parallelization of ODE solvers arises from solving ODEs for batches of data, a typical workflow in inverse problems, global sensitivity analysis, and uncertainty quantification. Traditionally, GPU-accelerated ODE solvers were specially developed for high-dimensional PDE systems, which can be easily adapted for batched ODE solvers. The approach for parallelization is to convert an array-based ODE solver to work with GPU-based arrays. These approaches have shortcomings, such as implicit synchronization of time steps for all the ODEs and GPU overheads. We propose that these approaches can be improved significantly where GPU acceleration for ODE solvers is device-agnostic, general-purpose, and accessible from a high-level language.
July 14, 2024 by hgpu