https://hgpu.org/?p=8637
Matrix-Matrix Multiplications on GPUs for Accelerating a Parallel Fluid Dynamics Code