https://hgpu.org/?p=2545
Fast development of dense linear algebra codes on graphics processors