https://hgpu.org/?p=6510
An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization in CUDA