HPX – The C++ Standard Library for Parallelism and Concurrency
Center for Computation & Technology, Louisiana State University, LA, Baton Rouge, United States of America
Journal of Open Source Software, 5(53), 2352, 2020
@article{kaiser2020hpx,
title={HPX-The C++ Standard Library for Parallelism and Concurrency},
author={Kaiser, Hartmut and Diehl, Patrick and Lemoine, Adrian S and Lelbach, Bryce Adelstein and Amini, Parsa and Berge, Agust{‘i}n and Biddiscombe, John and Brandt, Steven R and Gupta, Nikunj and Heller, Thomas and others},
journal={Journal of Open Source Software},
volume={5},
number={53},
pages={2352},
year={2020}
}
The new challenges presented by exascale system architectures have resulted in difficulty achieving the desired scalability using traditional distributed-memory runtimes. Asynchronous many-task systems (AMT) are based on a new paradigm showing promise in addressing these challenges, providing application developers with a productive and performant approach to programming on next generation systems. HPX is a C++ Library for concurrency and parallelism that is developed by The STE||AR Group, an international group of collaborators working in the field of distributed and parallel programming (Heller, Diehl, Byerly, Biddiscombe, & Kaiser, 2017; Kaiser et al., n.d.; Tabbal, Anderson, Brodowicz, Kaiser, & Sterling, 2011). It is a runtime system written using modern C++ techniques that are linked as part of an application. HPX exposes extended services and functionalities supporting the implementation of parallel, concurrent, and distributed capabilities for applications in any domain; it has been used in scientific computing, gaming, finances, data mining, and other fields. The HPX AMT runtime system attempts to solve some problems the community is facing when it comes to creating scalable parallel applications that expose excellent parallel efficiency and a high resource utilization. First, it exposes a C++ standards conforming API that unifies syntax and semantics for local and remote operations. This significantly simplifies writing codes that strive to effectively utilize different types of available parallelism in today’s machines in a coordinated way (i.e., on-node, off-node, and accelerator-based parallelism). Second, HPX implements an asynchronous C++ standard programming model that has the emergent property of semi-automatic parallelization of the user’s code. The provided API (especially when used in conjunction with the new C++20 co_await keyword (Standard ISO/IEC, 2020)) enables intrinsic overlap of computation and communication, prefers moving work to data over moving data to work, and exposes minimal overheads from its lightweight threading subsystem, ensuring efficient fine-grained parallelization and minimal-overhead synchronization and context switching. This programming model natively ensures high-system utilization and perfect scalability. A detailed comparison of HPX with various other AMTs is given in (Thoman et al., 2018). Some notable AMT solutions are: Uintah (Germain, McCorquodale, Parker, & Johnson, 2000), Chapel (Chamberlain, Callahan, & Zima, 2007), Charm++ (Kale & Krishnan, 1993), Kokkos (Edwards, Trott, & Sunderland, 2014), Legion (Bauer, Treichler, Slaughter, & Aiken, 2012), and PaRSEC (Bosilca et al., 2013). Note that we only refer to distributed memory solutions, since this is an important feature for scientific applications to run large scale simulations. The major showpiece of HPX compared to the mentioned distributed AMTs is its future-proof C++ standards conforming API and the exposed asynchronous programming model. HPX’s main goal is to improve efficiency and scalability of parallel applications by increasing resource utilization and reducing synchronization overheads through providing an asynchronous API and employing adaptive scheduling. The consequent use of Futures intrinsically enables overlap of computation and communication and constraint-based synchronization. HPX is able to maintain a balanced load among all the available resources resulting in significantly reducing processor starvation and effective latencies while controlling overheads. HPX fully conforms to the C++ ISO standards and implements the standardized concurrency mechanisms and parallelism facilities. Further, HPX extends those facilities to distributed use cases, thus enabling syntactic and semantic equivalence of local and remote operations on the API level. HPX uses the concept of C++ Futures to transform sequential algorithms into wait-free asynchronous executions. The use of Futurization enables the automatic creation of dynamic data flow execution trees of potentially millions of lightweight HPX tasks executed in the proper order. HPX also provides a work-stealing task scheduler that takes care of fine-grained parallelizations and automatic load balancing. Furthermore, HPX implements functionalities proposed as part of the ongoing C++ standardization process.
September 13, 2020 by hgpu