https://hgpu.org/?p=10444
Towards a functional run-time for dense NLA domain