Formalizing Address Spaces with application to Cuda, OpenCL, and beyond

Benedict R. Gaster
Advanced Micro Devices, 1 AMD, Sunnyvale, CA, USA
6th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU 6), 2013


   title={Formalizing Address Spaces with application to Cuda, OpenCL, and beyond},

   author={Gaster, Benedict R},



Download Download (PDF)   View View   Source Source   



Cuda and OpenCL are aimed at programmers developing parallel applications targeting GPUs and embedded micro-processors. These systems often have explicitly managed memories exposed directly though a notion of disjoint address spaces. OpenCL address spaces are based on a similar concept found in Embedded C. A limitation of OpenCL is that a specific pointer must be assigned to a particular address space and thus functions, for example, must say which pointer arguments point to which address spaces. This leads to a loss of composability and moreover can lead to implementing multiple versions of the same function. This problem is compounded in the OpenCL C++ variant where a class’ implicit this pointer can be applied to multiple address spaces. Modern GPUs, such as AMD’s Graphics Core Next and Nvidia’s Fermi, support an additional generic address space that dynamically determines an address’ disjoint address space, submitting the correct load/store operation to the particular memory subsystem. Generic address spaces allow for dynamic casting between generic and non-generic address spaces that is similar to the dynamic subtyping found in objected oriented languages. The advantage of the generic address space is it simplifies the programming model but sometimes at the cost of decreased performance, both dynamically and due to the optimization a compiler can safely perform. This paper describes a new type system for inferring Cuda and OpenCL style address spaces. We show that the address space system can be inferred. We extend this base system with a notion of generic address space, including dynamic casting, and show that there also exists a static translation to architectures without support for generic address spaces but comes at a potential performance cost. This performance cost can be reclaimed when an architecture directly supports generic address space.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: