https://hgpu.org/?p=5160
Analyzing program flow within a many-kernel OpenCL application