https://hgpu.org/?p=10089
On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages