https://hgpu.org/?p=15693
Efficient Parallel Implementation for Single Block Orthogonal Dictionary Learning