https://hgpu.org/?p=6777
Optimising the DBCSR GPU Implementation