https://hgpu.org/?p=16774
dCUDA: hardware supported overlap of computation and communication