https://hgpu.org/?p=3331
Fast in-place sorting with CUDA based on bitonic sort