https://hgpu.org/?p=8282
Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA