https://hgpu.org/?p=7321
Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware