https://hgpu.org/?p=28593
Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs