https://hgpu.org/?p=18613
Dense and sparse parallel linear algebra algorithms on graphics processing units