https://hgpu.org/?p=8468
Efficient implementation of data flow graphs on multi-gpu clusters