https://hgpu.org/?p=4475
Efficient implementation of the overlap operator on multi-GPUs