https://hgpu.org/?p=21539
Investigating Single Precision Floating General Matrix Multiply in Heterogeneous