https://hgpu.org/?p=8891
An Implementation of Conflict-Free Offline Permutation on the GPU