https://hgpu.org/?p=14241
Sorting and Permuting without Bank Conflicts on GPUs