https://hgpu.org/?p=7765
Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications