https://hgpu.org/?p=1853
Cache-efficient numerical algorithms using graphics hardware