https://hgpu.org/?p=2033
GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures