https://hgpu.org/?p=14083
The implementation and optimization of Bitonic sort algorithm based on CUDA