Characterizing CUDA and OpenMP Synchronization Primitives
Department of Computer Science, Texas State University, San Marcos, USA
IEEE International Symposium on Workload Characterization (IISWC’24), 2024
@article{burtchell2024characterizing,
title={Characterizing CUDA and OpenMP Synchronization Primitives},
author={Burtchell, Brandon Alexander and Burtscher, Martin},
year={2024}
}
Over the last two decades, parallelism has become the primary method for speeding up computer programs. When writing parallel code, it is often necessary to use synchronization primitives (e.g., atomics, barriers, or critical sections) to enforce correctness. However, the performance of synchronization primitives depends on a variety of complex factors that non-experts may be unaware of. Since multiple primitives can typically be used to complete the same task, choosing the best is often non-trivial. In this paper, we study the performance impact of these factors by measuring the throughput of OpenMP and CUDA synchronization primitives along multiple dimensions. We highlight interesting and non-intuitive behavior that software developers should be aware of when writing parallel programs.
August 25, 2024 by hgpu