high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Characterizing CUDA and OpenMP Synchronization Primitives

Characterizing CUDA and OpenMP Synchronization Primitives

Brandon Alexander Burtchell, Martin Burtscher

Department of Computer Science, Texas State University, San Marcos, USA

IEEE International Symposium on Workload Characterization (IISWC’24), 2024

BibTeX

Download (PDF)

View

Source

Source codes

Package:

SyncPerformance

1447

views

Over the last two decades, parallelism has become the primary method for speeding up computer programs. When writing parallel code, it is often necessary to use synchronization primitives (e.g., atomics, barriers, or critical sections) to enforce correctness. However, the performance of synchronization primitives depends on a variety of complex factors that non-experts may be unaware of. Since multiple primitives can typically be used to complete the same task, choosing the best is often non-trivial. In this paper, we study the performance impact of these factors by measuring the throughput of OpenMP and CUDA synchronization primitives along multiple dimensions. We highlight interesting and non-intuitive behavior that software developers should be aware of when writing parallel programs.

Tags: Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 2070, nVidia GeForce RTX 4090, OpenMP, Package, Performance

August 25, 2024 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Characterizing CUDA and OpenMP Synchronization Primitives

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Characterizing CUDA and OpenMP Synchronization Primitives

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)