https://hgpu.org/?p=12705
Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU Accelerators