https://hgpu.org/?p=6165
Parallelization of maximum likelihood fits with OpenMP and CUDA