https://hgpu.org/?p=12721
Scalable Kernel Fusion for Memory-Bound GPU Applications