high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automating Heterogeneous Parallelism in Numerical Differential Equations

Automating Heterogeneous Parallelism in Numerical Differential Equations

Utkarsh

Massachusetts Institute of Technology

Massachusetts Institute of Technology, 2024

BibTeX

Download (PDF)

View

Source

Source codes

Package:

GPUODEBenchmarks: Comparsion of Julia’s GPU Kernel based ODE solvers with other open-source GPU ODE solvers

917

views

Scientific computing is an amalgamation of numerical methods and computer science. Developments in numerical analysis have allowed stable and accurate numerical schemes, whereas computer algorithms have been successfully adopted to standard multicore systems of today, enabling parallelism. Combining efficient numerical algorithms with efficient parallelism presents a challenge mainly due to the independent development of these fields and is, therefore, typically solved on a domain-specific basis by domain experts. The development of general-purpose tools that integrate parallelism into algorithms, accessible through highlevel languages, signifies the future direction for addressing computational demands across various domains. This thesis work represents a culmination of efforts in general-purpose parallel numerical algorithms for solving differential equations. We make them accessible by choosing the Julia programming language to implement the high-level framework. Solving differential equations appears to be an intrinsically serial process due to progressive time-stepping that proves challenging to parallelize. Most of the approaches are linked to two broad categories; The first is the parallelism of the solver operations by making each solve faster, and the latter is the parallelism between the solves, i.e., solving multiple batches at a time. We automate the parallelization process in both these domains while keeping the algorithms general-purpose. Parallelization with different hardware accelerators, such as CPUs and GPUs, is also investigated. Parallelism for sufficiently large stiff ODEs is traditionally linked to the parallelization of the matrix factorization stage. However, these methods still need to overcome the threading overhead for ODEs having less than approximately 200 states. We propose implementing adaptive-order, adaptive time-stepping stiff ODE solvers such as extrapolation methods, which can parallelize a single instance of an ODE solve even for small ODEs. The other need for parallelization of ODE solvers arises from solving ODEs for batches of data, a typical workflow in inverse problems, global sensitivity analysis, and uncertainty quantification. Traditionally, GPU-accelerated ODE solvers were specially developed for high-dimensional PDE systems, which can be easily adapted for batched ODE solvers. The approach for parallelization is to convert an array-based ODE solver to work with GPU-based arrays. These approaches have shortcomings, such as implicit synchronization of time steps for all the ODEs and GPU overheads. We propose that these approaches can be improved significantly where GPU acceleration for ODE solvers is device-agnostic, general-purpose, and accessible from a high-level language.

Tags: AMD RX Vega 64, ATI, Computer science, CUDA, Differential equations, Factorization, Heterogeneous systems, Intel, Intel Arc A770, Julia, Numerical Analysis, nVidia, nVidia Quadro RTX 5000, Package, Tesla V100, Thesis

July 14, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Automating Heterogeneous Parallelism in Numerical Differential Equations

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Automating Heterogeneous Parallelism in Numerical Differential Equations

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)