https://hgpu.org/?p=5933
Fast Implementation of DGEMM on Fermi GPU