https://hgpu.org/?p=26151
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs