https://hgpu.org/?p=16780
Hardware thread reordering to boost OpenCL throughput on FPGAs