https://hgpu.org/?p=4421
Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring