https://hgpu.org/?p=2427
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA