Over-synchronization in GPU Programs
Indian Institute of Science, Bengaluru, India
57th IEEE/ACM International Symposium on Microarchitecture (MICRO’24), 2024
@article{nayak2024over,
title={Over-synchronization in GPU Programs},
author={Nayak, Ajay and Basu, Arkaprava},
year={2024}
}
The performance of GPU (Graphics Processing Unit)-accelerated functions affects a large spectrum of modern software. Efficiently synchronizing across thousands of concurrent threads is critical to the performance of GPU programs. GPU vendors have introduced advanced programming constructs, e.g., scopes, for efficiently synchronizing within a chosen subset of threads. However, programmers must explicitly employ them, where applicable, to benefit from such features. We demonstrate how GPU programs can leave performance on the table by failing to fully harness advanced synchronization features in modern GPUs – leading to over-synchronization. We discover three different variants of over-synchronization observed in real-world applications. We then build a tool, ScopeAdvice, to find cases of over-synchronization in CUDA programs. Avoiding reported over-synchronization improves the performance of several GPU applications by up to 55%.
November 10, 2024 by hgpu