Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms

hgpu.org » Applications » Computer science » Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms

Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms

Derek R. Hower, Bradford M. Beckmann, Benedict R. Gaster, Blake A. Hechtman, Mark D. Hill, Steven K. Reinhardt, David A. Wood

AMD Research

Workshop on Memory Systems Performance and Correctness (MSPC), 2013

BibTeX

Download (PDF)

View

Source

2260

views

Hardware vendors now provide heterogeneous platforms in commodity markets (e.g., integrated CPUs and GPUs), and are promising an integrated, shared memory address space for such platforms in future iterations. Because not all threads in a heterogeneous platform can communicate with the same latency, vendors are proposing synchronization mechanisms that allow threads to communicate with a subset of threads (called a scope). However, vendors have yet to define a comprehensive and portable memory model that programmers can use to reason about scopes. Moreover, existing CPU memory models, such as Sequential Consistency for Data-Race-Free (SC for DRF), are ill-suited, in part, because they define all synchronization operations globally and preclude low-energy, high-performance local coordination. Towards this end, we embrace scoped synchronization with a new class of memory consistency models: Sequential Consistency for Heterogeneous-Race-Free (SC for HRF). Inspired by SC for DRF (C++, Java), the new models provide programmers with SC for programs with "sufficient" synchronization (no data races) of "sufficient" scope. We develop the first such model, called HRF0, show how it can be used to develop high-performance code, show example hardware support, and motivate future work.

Tags: Computer science, CUDA, Heterogeneous systems, Memory model, nVidia, OpenCL, Performance

May 23, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org