https://hgpu.org/?p=1215
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping