https://hgpu.org/?p=2600
Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures