https://hgpu.org/?p=16816
Automating the Last-Mile for High Performance Dense Linear Algebra