https://hgpu.org/?p=7310
Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments