https://hgpu.org/?p=13406
Different Optimization Strategies and Performance Evaluation of Reduction on Multicore CUDA Architecture