https://hgpu.org/?p=10523
Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling