https://hgpu.org/?p=19166
DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems