29599

Leveraging the potential of task-based programming with OpenMP task graphs

Chenle Yu
Departament d’Arquitectura de Computadors, Universitat Politecnica de Catalunya
Universitat Politècnica de Catalunya, 2024

@article{yu2024leveraging,

   title={Leveraging the potential of task-based programming with OpenMP task graphs},

   author={Yu, Chenle and others},

   publisher={Universitat Polit{`e}cnica de Catalunya},

   year={2024}

}

The task execution model is widely used in computer engineering, it helps developers to design, develop and understand software systems. OpenMP is the de-facto programming model to parallelize sequential algorithms on shared-memory machines. Coupled with the task parallelization, OpenMP is able to conveniently parallelize structured and non-structured applications, it also allows users to offload work onto accelerators as target tasks. However, the runtime overhead incurred by the OpenMP tasking model is an important concern for users to develop OpenMP task programs. This work focuses on improving OpenMP tasking model. Firstly, we carried out an analysis of the performance overhead and bottleneck of mainstream task implementations and proposed a solution in the OpenMP specification to tackle it. To elaborate, we observe that a significant portion of the overhead in the tasking model stems from thread contention, where multiple threads compete to access shared resources simultaneously, such as task queues, causing these threads to stall. As the number of cores in modern architectures increases, this further hampers the scalability of OpenMP tasking. We propose a mechanism that creates graphs representing sets of OpenMP tasks. Once built, executing such graphs incurs less runtime overhead by drastically reducing the access to shared resources. This mechanism is exposed to the users as a new OpenMP directive, namely Taskgraph. Secondly, we implemented the proposed solution, the taskgraph directive, in both GCC (prototype implementation) and LLVM compilers (complete implementation). Initially, our focus was on the GCC compiler, particularly its runtime system: libgomp. Our prototype implementation in this compiler demonstrated promising performance improvement using taskgraph. However, it also revealed a performance bottleneck in libgomp: all tasks are scheduled into a common queue, leading to significant contention and resulting in poor performance and scalability compared to LLVM. The complete implementation of the taskgraph framework is in the LLVM compiler. Our modifications in the compiler range from the front-end to the middle-end of the compiler, in addition to its runtime library: libomp. This framework allows users to declare taskgraph directives in OpenMP C/C++ code to create graphs conveniently, at either compile time or run-time. The experiments show that the taskgraph framework outperforms the original task implementations from GCC and LLVM. We carried out the experiments on nodes of the Marenostrum4 supercomputer. Finally, we enhance the OpenMP offloading mechanism by leveraging taskgraph. Particularly, we implemented the transformation of taskgraph to CUDA graphs. Consequently, our framework enhances the interoperability of OpenMP with other programming models (in this case, CUDA) and improves the performance of OpenMP accelerator model by alleviating the synchronization overhead. With these contributions, this thesis ameliorates both the OpenMP tasking and accelerator models. The framework has been used by other Ph.D. students to develop their research, for example, Cyril Cetre from Thales Research and Technology successfully improved the performance of a cyber-physical application by utilizing static generation of CUDA graphs, as presented in this manuscript. Furthermore, the OpenMP Language Committee accepted our proposition to include the taskgraph directive into the OpenMP Specification v6.0. This thesis also contributed to the upstream LLVM repository. These commits are mainly focused on the record-and-replay mechanism of taskgraph, serving also as a basis for the official taskgraph implementation in the LLVM. We hope with these endeavors, this work will promote the use of OpenMP task in general.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: