https://hgpu.org/?p=28467
Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution