https://hgpu.org/?p=3794
Efficient Parallel Intra-prediction Mode Selection Scheme for 4x4 Blocks in H.264