high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient implementation of GPGPU synchronization primitives on CPUs

Efficient implementation of GPGPU synchronization primitives on CPUs

Jayanth Gummaraju, Ben Sander, Laurent Morichetti, Benedict Gaster, Lee Howes

Advanced Micro Devices, Sunnyvale, CA, USA

Proceedings of the 7th ACM international conference on Computing frontiers, CF ’10, 2010

DOI:10.1145/1787275.1787295

BibTeX

Source

1574

views

The GPGPU model represents a style of execution where thousands of threads execute in a data-parallel fashion, with a large subset (typically 10s to 100s) needing frequent synchronization. As the GPGPU model evolves target both GPUs and CPUs as acceleration targets, thread synchronization becomes an important problem when running on CPUs. CPUs have little hardware support for synchronization and must be emulated in software, reducing application performance. This paper presents software techniques to implement the GPGPU synchronization primitives on CPUs, while maintaining application debug-ability. Performing limit studies using real hardware, we evaluate the potential performance benefits of an efficient barrier primitive.

Tags: Computer science, OpenCL, Performance, Programming techniques

August 23, 2011 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Efficient implementation of GPGPU synchronization primitives on CPUs

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

Efficient implementation of GPGPU synchronization primitives on CPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)