https://hgpu.org/?p=23473
Designing Efficient Barriers and Semaphores for Graphics Processing Units