https://hgpu.org/?p=1397
Faster Radix Sort via Virtual Memory and Write-Combining