https://hgpu.org/?p=2233
Performance Study of LU Decomposition on the Programmable GPU