Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives
Center for Research and Advanced Studies of the National Polytechnic Institute, Mathematics Department (ABACUS-CINVESTAV-IPN) Box 14-740, 07000 Mexico City, D.F.
Scientific Programming, 2015
@article{couder2015performance,
title={Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+ LEO, and OpenACC directives},
author={Couder-Castaneda, C and Barrios-Pina, H and Gitler, I and Arroyo, M},
year={2015}
}
In this work, a serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. OpenMP has become the programming standard for scientific multi-core software. Similarly, OpenACC is one true alternative for graphics accelerators such as the NVIDIA GPUs, without the need of programming low-level kernels where mistakes can lead to costly and complicated implementations. This research follows well proven strategies in order to obtain the best performance in both OpenMP and OpenACC. The strategies using OpenMP are oriented toward reducing the creation of parallel regions, tasks creation to handle boundary conditions and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-cores Xeon CPU, Xeon Phi (TM) 5110p and TESLA C2070, obtaining the best performance from the latter. The TESLA C2070 presented an acceleration factor of 9.86X, 1.6X, 4.5X regarding results from the serial reference version on CPU, 12-cores Xeon CPU and Xeon Phi (TM), respectively.
June 10, 2015 by hgpu