high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Akihiro Hayashi, Jun Shirako, Etorre Tiotto, Robert Ho, Vivek Sarkar

Department of Computer Science, Rice University, Houston, TX, USA

Rice University, 2018

@article{hayashi2018performance,

title={Performance Evaluation of OpenMP’s Target Construct on GPUs-Exploring Compiler Optimizations},

author={Hayashi, Akihiro and Shirako, Jun and Tiotto, Etorre and Ho, Robert and Sarkar, Vivek},

year={2018}

}

Download (PDF)

View

Source

1623

views

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimizations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimizing high-level programs for GPU execution, in this paper, we 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100) and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.

Tags: Compilers, Computer science, CUDA, LLVM, nVidia, OpenMP, Performance, Tesla K80, Tesla P100

August 26, 2018 by hgpu

Rating: 1.0/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Performance Evaluation of OpenMP’s Target Construct on GPUs – Exploring Compiler Optimizations

Share this:

Recent source codes

Most viewed papers (last 30 days)