https://hgpu.org/?p=13253
Efficient GPU Implementation for Single Block Orthogonal Dictionary Learning