https://hgpu.org/?p=6872
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines