https://hgpu.org/?p=2596
Program Optimization Strategies for Data-Parallel Many-Core Processors