https://hgpu.org/?p=9611
GPU Matrix Multiplication