https://hgpu.org/?p=7405
Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory