https://hgpu.org/?p=15012
Orchestrating Multiple Data-Parallel Kernels on Multiple Devices