https://hgpu.org/?p=2914
Program Optimization Study on a 128-Core GPU