https://hgpu.org/?p=2909
Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications