https://hgpu.org/?p=8359
Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units