high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Towards shared memory consistency models for GPUs

Towards shared memory consistency models for GPUs

Tyler Sorensen

The University of Utah

The University of Utah, 2013

@phdthesis{sorensen2013towards,

title={Towards shared memory consistency models for GPUs},

author={Sorensen, Tyler},

year={2013},

school={The University of Utah}

}

Download (PDF)

View

Source

2378

views

With the widespread use of GPUs, it is important to ensure that programmers have a clear understanding of their shared memory consistency model i.e. what values can be read when issued concurrently with writes. While memory consistency has been studied for CPUs, GPUs present very different memory and concurrency systems and have not been well studied. We propose a collection of litmus tests that shed light on interesting visibility and ordering properties. These include classical shared memory consistency model properties, such as coherence and write atomicity, as well as GPU specific properties e.g. memory visibility differences between intra and inter block threads. The results of the litmus tests are determined by feedback from industry experts, the limited documentation available and properties common to all modern multi-core systems. Some of the behaviors remain unresolved. Using the results of the litmus tests, we establish a formal state transition model using intuitive data structures and operations. We implement our model in the Murphi modeling language and verify the initial litmus tests. As a preliminary study, we restrict our model to loads and stores across global and shared memory along with two of the memory fences given in the NVIDIA PTX, thread fence and thread fence block. Finally, we show real world examples of code that make assumptions about the GPU shared memory consistency model that are inconsistent with our proposed model.

Tags: Computer science, CUDA, Memory model, nVidia, Performance, PTX, Thesis

June 4, 2013 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Towards shared memory consistency models for GPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Towards shared memory consistency models for GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)