https://hgpu.org/?p=12278
Dynamic loop vectorization for executing OpenCL kernels on CPUs