https://hgpu.org/?p=4131
Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark