https://hgpu.org/?p=6655
Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU