https://hgpu.org/?p=13386
Reproducible and Accurate Matrix Multiplication for GPU Accelerators