https://hgpu.org/?p=2594
Efficient Sparse Matrix-Vector Multiplication on CUDA