high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Towards scalar synchronization in SIMT architectures

Towards scalar synchronization in SIMT architectures

Arun Ramamurthy

The University Of British Columbia, Vancouver

The University Of British Columbia, 2011

@article{ramamurthy2011towards,

title={Towards scalar synchronization in SIMT architectures},

author={Ramamurthy, A.},

year={2011},

publisher={University of British Columbia}

}

Download (PDF)

View

Source

1589

views

An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp or wavefront and executes this group of scalar threads in lockstep. The inherent mismatch between scalar programming model and vector hardware creates a challenge when developing applications that employ synchronization on the GPU. This challenge arises from the use of a hardware stack to manage control flow divergence among scalar threads. This thesis explains the porting of the Apriori benchmark to a GPU which led to the research on synchronization in SIMT hardware. It then proposes instruction set and hardware changes that simplify the implementation of mutual exclusion when porting multiple-instruction, multiple data (MIMD) programs with synchronization to accelerators employing single-instruction, multiple thread (SIMT) hardware. These instructions when compared with more complex software only solutions, achieve similar performance. This thesis also implements and evaluates queue based mutual exclusion on SIMT hardware.

Tags: Benchmarking, Computer science, CUDA, nVidia, OpenCL, Performance, Programming techniques, Thesis

October 14, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Towards scalar synchronization in SIMT architectures

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Towards scalar synchronization in SIMT architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)