https://hgpu.org/?p=13832
Benchmarking the cost of thread divergence in CUDA