https://hgpu.org/?p=12652
Improved GPU Co-processor Sorting Algorithm with Barrier Synchronization