https://hgpu.org/?p=16699
Balancing locality and concurrency: solving sparse triangular systems on GPUs