Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Akihiro Hayashi, Jun Shirako, Etorre Tiotto, Robert Ho, Vivek Sarkar
Department of Computer Science, Rice University, Houston, TX, USA
Rice University, 2018


   title={Performance Evaluation of OpenMP’s Target Construct on GPUs-Exploring Compiler Optimizations},

   author={Hayashi, Akihiro and Shirako, Jun and Tiotto, Etorre and Ho, Robert and Sarkar, Vivek},



Download Download (PDF)   View View   Source Source   



OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimizations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimizing high-level programs for GPU execution, in this paper, we 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100) and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.
Rating: 1.0/5. From 1 vote.
Please wait...

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: