https://hgpu.org/?p=1141
Implementing sparse matrix-vector multiplication on throughput-oriented processors