https://hgpu.org/?p=7461
High-Performance Matrix-Vector Multiplication on the GPU