https://hgpu.org/?p=7686
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems