https://hgpu.org/?p=7465
Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs