Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations
Department of Computer Science, Rice University, Houston, TX, USA
International Journal of High Performance Computing and Networking (IJHPCN), 2019
@article{hayashi2019performance,
title={Performance evaluation of OpenMP’s target construct on GPUs-exploring compiler optimisations},
author={Hayashi, Akihiro and Shirako, Jun and Tiotto, Etorre and Ho, Robert and Sarkar, Vivek},
journal={International Journal of High Performance Computing and Networking},
volume={13},
number={1},
pages={54–69},
year={2019},
publisher={Inderscience Publishers (IEL)}
}
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimizations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimizing high-level programs for GPU execution, in this paper, we 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100) and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.
February 10, 2019 by hgpu