A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs
Tokyo Institute of Technology
arXiv:2004.05371 [cs.DC], (11 Apr 2020)
@misc{zhang2020study,
title={A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs},
author={Lingqi Zhang and Mohamed Wahib and Haoyu Zhang and Satoshi Matsuoka},
year={2020},
eprint={2004.05371},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia’s latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods.
April 19, 2020 by hgpu