https://hgpu.org/?p=10593
GPU-TLS: An Efficient Runtime for Speculative Loop Parallelization on GPUs