https://hgpu.org/?p=18930
PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion