https://hgpu.org/?p=9175
Communication-Minimizing 2D Convolution in GPU Registers