A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka
Tokyo Institute of Technology
arXiv:2004.05371 [cs.DC], (11 Apr 2020)


   title={A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs},

   author={Lingqi Zhang and Mohamed Wahib and Haoyu Zhang and Satoshi Matsuoka},






GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia’s latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods.
Rating: 5.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: