https://hgpu.org/?p=24690
Scalable communication for high-order stencil computations using CUDA-aware MPI