https://hgpu.org/?p=5069
Approaches for parallelizing reductions on modern GPUs