https://hgpu.org/?p=7389
Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation