https://hgpu.org/?p=5752
Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU