https://hgpu.org/?p=15008
Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution