high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » GPU concurrency: Weak behaviours and programming assumptions

GPU concurrency: Weak behaviours and programming assumptions

Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, John Wickerson

University College London

20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2015), 2015

BibTeX

Download (PDF)

View

Source

2663

views

Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming – often supported by official tutorials – as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy.

Tags: ATI, ATI Radeon HD 6570, ATI Radeon HD 7970, Computer science, Memory, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 540 M, nVidia GeForce GTX 660, nVidia GeForce GTX 750, nVidia GeForce GTX Titan, OpenCL, Performance, PTX, Tesla C2075

January 21, 2015 by hgpu

Rating: 2.3/5. From 7 votes.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

GPU concurrency: Weak behaviours and programming assumptions

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

GPU concurrency: Weak behaviours and programming assumptions

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)