https://hgpu.org/?p=18796
Analyzing GPU Tensor Core Potential for Fast Reductions