https://hgpu.org/?p=5953
An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs