The Model of Computation of CUDA and its Formal Semantics
Institut fur Informatik, Universitat Augsburg
Universitat Augsburg, 2011
@article{habermaier2011model,
title={The Model of Computation of CUDA and its Formal Semantics$}$},
author={Habermaier, A.},
year={2011},
publisher={Universit{"a}tsbibliothek der Universit{"a}t Augsburg$}$}
}
We formalize the model of computation of modern graphics cards based on the specification of Nvidia’s Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution model that groups threads into warps. All threads of the same warp execute the program in lockstep. If threads of the same warp execute a data-dependent control flow instruction, control flow might diverge and the different execution paths are executed sequentially. Once all paths complete execution, all threads are executed in parallel again. An operational semantics of a significant subset of CUDA’s memory operations and programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls, control flow instructions, and thread synchronization. Based on this formalization we prove that CUDA’s single instruction, multiple threads execution model is safe: For all threads it is provably true that a thread only executes the instructions it is allowed to execute, that it does not continue execution after processing the last instruction of the program, and that it does not skip any instructions it should have executed. On the other hand, we demonstrate that CUDA’s inability to handle control flow instructions individually for each thread can cause unexpected program behavior in the sense that a liveness property is violated.
October 25, 2011 by hgpu