https://hgpu.org/?p=3105
Inter-Block GPU Communication via Fast Barrier Synchronization