https://hgpu.org/?p=7742
Using Fermi architecture knowledge to speed up CUDA and OpenCL programs