high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Akihiro Hayashi, Jun Shirako, Etorre Tiotto, Robert Ho, Vivek Sarkar

Department of Computer Science, Rice University, Houston, TX, USA

International Journal of High Performance Computing and Networking (IJHPCN), 2019

BibTeX

Download (PDF)

View

Source

2556

views

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP’s high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimizations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimizing high-level programs for GPU execution, in this paper, we 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100) and 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.

Tags: Compilers, Computer science, CUDA, LLVM, nVidia, OpenMP, Optimization, Tesla K80, Tesla P100

February 10, 2019 by hgpu

Rating: 1.0/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Performance Evaluation of OpenMP’s Target Construct on GPUs: Exploring Compiler Optimizations

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)