https://hgpu.org/?p=16654
Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online