https://hgpu.org/?p=897
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication