To GPU Synchronize or Not GPU Synchronize?

hgpu.org » Applications » Computer science » To GPU Synchronize or Not GPU Synchronize?

To GPU Synchronize or Not GPU Synchronize?

Wu-Chun Feng, Shucai Xiao

Department of Computer Science, Virginia Tech

Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), (2010) Publisher: IEEE, Pages: 3801-3804

DOI:10.1109/ISCAS.2010.5537722

BibTeX

Download (PDF)

View

Source

2006

views

The graphics processing unit (GPU) has evolved from being a fixed-function processor with programmable stages into a programmable processor with many fixed-function components that deliver massive parallelism. By modifying the GPU’s stream processor to support “general-purpose computation” on the GPU (GPGPU), applications that perform massive vector operations can realize many orders-of-magnitude improvement in performance over a traditional processor, i.e., CPU. However, the breadth of general-purpose computation that can be efficiently supported on a GPU has largely been limited to highly dataparallel or task-parallel applications due to the lack of explicit support for communication between streaming multiprocessors (SMs) on the GPU. Such communication can occur via the global memory of a GPU, but it then requires a barrier synchronization across the SMs of the GPU in order to complete the communication between SMs. Although our previous work demonstrated that implementing barrier synchronization on the GPU itself can significantly improve performance and deliver correct results in critical bioinformatics applications, guaranteeing the correctness of inter-SM communication is only possible if a memory consistency model is assumed. To address this problem, NVIDIA recently introduced the _threadfence() function in CUDA 2.2, a function that can guarantee the correctness of GPU-based inter-SM communication. However, this function currently introduces so much overhead that when using it in (direct) GPU synchronization, GPU synchronization actually performs worse than indirect synchronization via the CPU, thus raising the question of whether “to GPU synchronize or not GPU synchronize?”

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 280, Programming techniques, Synchronization

March 12, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

To GPU Synchronize or Not GPU Synchronize?

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

To GPU Synchronize or Not GPU Synchronize?

Share this:

Recent source codes

Most viewed papers (last 30 days)