https://hgpu.org/?p=26333
Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs