https://hgpu.org/?p=3319
Efficient stream reduction on the GPU