https://hgpu.org/?p=4796
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs