Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations
Department of Computer Science, Rice University, Houston, TX, USA
Rice University, 2018
@article{hayashi2018performance,
title={Performance Evaluation of OpenMP’s Target Construct on GPUs-Exploring Compiler Optimizations},
author={Hayashi, Akihiro and Shirako, Jun and Tiotto, Etorre and Ho, Robert and Sarkar, Vivek},
year={2018}
}
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimizations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimizing high-level programs for GPU execution, in this paper, we 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100) and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.
August 26, 2018 by hgpu