18745

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Akihiro Hayashi, Jun Shirako, Etorre Tiotto, Robert Ho, Vivek Sarkar
Department of Computer Science, Rice University, Houston, TX, USA
International Journal of High Performance Computing and Networking (IJHPCN), 2019

@article{hayashi2019performance,

   title={Performance evaluation of OpenMP’s target construct on GPUs-exploring compiler optimisations},

   author={Hayashi, Akihiro and Shirako, Jun and Tiotto, Etorre and Ho, Robert and Sarkar, Vivek},

   journal={International Journal of High Performance Computing and Networking},

   volume={13},

   number={1},

   pages={54–69},

   year={2019},

   publisher={Inderscience Publishers (IEL)}

}

Download Download (PDF)   View View   Source Source   

2102

views

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimizations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimizing high-level programs for GPU execution, in this paper, we 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100) and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.
Rating: 1.0/5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: