https://hgpu.org/?p=16147
Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation