Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

hgpu.org » Applications » Computer science » Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

C. Couder-Castaneda, H. Barrios-Pina, I. Gitler, M. Arroyo

Center for Research and Advanced Studies of the National Polytechnic Institute, Mathematics Department (ABACUS-CINVESTAV-IPN) Box 14-740, 07000 Mexico City, D.F.

Scientific Programming, 2015

@article{couder2015performance,

title={Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+ LEO, and OpenACC directives},

author={Couder-Castaneda, C and Barrios-Pina, H and Gitler, I and Arroyo, M},

year={2015}

}

Download (PDF)

View

Source

1371

views

In this work, a serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. OpenMP has become the programming standard for scientific multi-core software. Similarly, OpenACC is one true alternative for graphics accelerators such as the NVIDIA GPUs, without the need of programming low-level kernels where mistakes can lead to costly and complicated implementations. This research follows well proven strategies in order to obtain the best performance in both OpenMP and OpenACC. The strategies using OpenMP are oriented toward reducing the creation of parallel regions, tasks creation to handle boundary conditions and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-cores Xeon CPU, Xeon Phi (TM) 5110p and TESLA C2070, obtaining the best performance from the latter. The TESLA C2070 presented an acceleration factor of 9.86X, 1.6X, 4.5X regarding results from the serial reference version on CPU, 12-cores Xeon CPU and Xeon Phi (TM), respectively.

Tags: Computer science, Fortran, Intel Xeon Phi, nVidia, OpenACC, OpenMP, Performance, Tesla C2070

June 10, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org