high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka

Tokyo Institute of Technology

arXiv:2004.05371 [cs.DC], (11 Apr 2020)

@misc{zhang2020study,

title={A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs},

author={Lingqi Zhang and Mohamed Wahib and Haoyu Zhang and Satoshi Matsuoka},

year={2020},

eprint={2004.05371},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

SyncMicrobenchmark: it aims at characterizing the synchronization methods in CUDA

1805

views

GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia’s latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods.

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia DGX-1, Package, Performance, Tesla P100, Tesla V100

April 19, 2020 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

Package:

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)