https://hgpu.org/?p=6613
Customizable Memory Schemes for Data Parallel Accelerators