https://hgpu.org/?p=11878
Scheduling Dataflow Execution Across Multiple Accelerators