https://hgpu.org/?p=6332
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures