29182

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Peter Thoman, Philip Salzmann
Distributed and Parallel Systems Group, University of Innsbruck, Technikerstraße 21a, Innsbruck 6020, Tirol, Austria
SN Computer Science, Volume 5, 409, 2024

@article{thoman2024balancing,

   title={Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach},

   author={Thoman, Peter and Salzmann, Philip},

   journal={SN Computer Science},

   volume={5},

   number={4},

   pages={409},

   year={2024},

   publisher={Springer}

}

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means that the runtime system has to keep track of both fine-grained task dependencies and data residency meta-information. The amount of such meta-information is proportional to the granularity of parallelism which needs to be managed, introducing a trade-off. More precise tracking of data state allows leveraging more opportunities for compute and transfer parallelism, while also introducing more overhead. As such, the fidelity of the information being tracked needs to be managed carefully, ideally without introducing additional latency, communication or substantial compute overhead. We present the “Horizons” approach, designed to fulfill these goals. Specifically, horizons allow for the effective and efficient management of parallelism and the coalescing of previous fine-grained tracking information while maintaining an easily configurable scheduling window with full information precision. As an additional benefit, they provide consistent cluster-wide decision points without requiring any inter-node communication, and effectively cap the size of state tracking data structures even in the presence of problematic access patterns. Experimental evaluation on microbenchmarks and dry runs demonstrates that horizons are effective in keeping the scheduling complexity constant, while their own overhead is negligible—below 10μs per horizon when building a command graph for 512 GPUs. We additionally demonstrate the performance impact of horizons—as well as their low overhead—on a real-world application.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: