https://hgpu.org/?p=1084
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead