https://hgpu.org/?p=5261
Efficient implementation of GPGPU synchronization primitives on CPUs