Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

hgpu.org » Applications » Computer science » Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Peter Thoman, Philip Salzmann

Distributed and Parallel Systems Group, University of Innsbruck, Technikerstraße 21a, Innsbruck 6020, Tirol, Austria

SN Computer Science, Volume 5, 409, 2024

DOI:10.1007/s42979-024-02749-w

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Celerity: High-level C++ for Accelerator Clusters

1213

views

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means that the runtime system has to keep track of both fine-grained task dependencies and data residency meta-information. The amount of such meta-information is proportional to the granularity of parallelism which needs to be managed, introducing a trade-off. More precise tracking of data state allows leveraging more opportunities for compute and transfer parallelism, while also introducing more overhead. As such, the fidelity of the information being tracked needs to be managed carefully, ideally without introducing additional latency, communication or substantial compute overhead. We present the “Horizons” approach, designed to fulfill these goals. Specifically, horizons allow for the effective and efficient management of parallelism and the coalescing of previous fine-grained tracking information while maintaining an easily configurable scheduling window with full information precision. As an additional benefit, they provide consistent cluster-wide decision points without requiring any inter-node communication, and effectively cap the size of state tracking data structures even in the presence of problematic access patterns. Experimental evaluation on microbenchmarks and dry runs demonstrates that horizons are effective in keeping the scheduling complexity constant, while their own overhead is negligible—below 10μs per horizon when building a command graph for 512 GPUs. We additionally demonstrate the performance impact of horizons—as well as their low overhead—on a real-world application.

Tags: Benchmarking, Computer science, GPU cluster, HPC, nVidia, nVidia V100, Package, SYCL

April 14, 2024 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org