https://hgpu.org/?p=3302
Improving accuracy for matrix multiplications on GPUs