https://hgpu.org/?p=19719
ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs