https://hgpu.org/?p=17583
Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures