https://hgpu.org/?p=8344
Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs