https://hgpu.org/?p=9019
Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs