https://hgpu.org/?p=1770
Automatic tuning matrix multiplication performance on graphics hardware