https://hgpu.org/?p=1207
Efficient computation of sum-products on GPUs through software-managed cache