high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

C. Couder-Castaneda, H. Barrios-Pina, I. Gitler, M. Arroyo

Center for Research and Advanced Studies of the National Polytechnic Institute, Mathematics Department (ABACUS-CINVESTAV-IPN) Box 14-740, 07000 Mexico City, D.F.

Scientific Programming, 2015

@article{couder2015performance,

title={Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+ LEO, and OpenACC directives},

author={Couder-Castaneda, C and Barrios-Pina, H and Gitler, I and Arroyo, M},

year={2015}

}

Download (PDF)

View

Source

1946

views

In this work, a serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. OpenMP has become the programming standard for scientific multi-core software. Similarly, OpenACC is one true alternative for graphics accelerators such as the NVIDIA GPUs, without the need of programming low-level kernels where mistakes can lead to costly and complicated implementations. This research follows well proven strategies in order to obtain the best performance in both OpenMP and OpenACC. The strategies using OpenMP are oriented toward reducing the creation of parallel regions, tasks creation to handle boundary conditions and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-cores Xeon CPU, Xeon Phi (TM) 5110p and TESLA C2070, obtaining the best performance from the latter. The TESLA C2070 presented an acceleration factor of 9.86X, 1.6X, 4.5X regarding results from the serial reference version on CPU, 12-cores Xeon CPU and Xeon Phi (TM), respectively.

Tags: Computer science, Fortran, Intel Xeon Phi, nVidia, OpenACC, OpenMP, Performance, Tesla C2070

June 10, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)