29362

Characterizing CUDA and OpenMP Synchronization Primitives

Brandon Alexander Burtchell, Martin Burtscher
Department of Computer Science, Texas State University, San Marcos, USA
IEEE International Symposium on Workload Characterization (IISWC’24), 2024

@article{burtchell2024characterizing,

   title={Characterizing CUDA and OpenMP Synchronization Primitives},

   author={Burtchell, Brandon Alexander and Burtscher, Martin},

   year={2024}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

1030

views

Over the last two decades, parallelism has become the primary method for speeding up computer programs. When writing parallel code, it is often necessary to use synchronization primitives (e.g., atomics, barriers, or critical sections) to enforce correctness. However, the performance of synchronization primitives depends on a variety of complex factors that non-experts may be unaware of. Since multiple primitives can typically be used to complete the same task, choosing the best is often non-trivial. In this paper, we study the performance impact of these factors by measuring the throughput of OpenMP and CUDA synchronization primitives along multiple dimensions. We highlight interesting and non-intuitive behavior that software developers should be aware of when writing parallel programs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: