https://hgpu.org/?p=6596
Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming