https://hgpu.org/?p=19001
A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels