high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Joshua Hoke Davis, Christopher Daley, Swaroop Pophale, Thomas Huber, Sunita Chandrasekaran, Nicholas J. Wright

University of Delaware, Newark DE 19716, USA

arXiv:2010.09454 [cs.PF], (20 Oct 2020)

@misc{davis2020performance,

title={Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs},

author={Joshua Hoke Davis and Christopher Daley and Swaroop Pophale and Thomas Huber and Sunita Chandrasekaran and Nicholas J. Wright},

year={2020},

eprint={2010.09454},

archivePrefix={arXiv},

primaryClass={cs.PF}

}

Download (PDF)

View

Source

2406

views

Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today’s systems to tomorrow’s. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. This work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other; a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC’s Cori system when using the Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when using the Cray-llvm compiler on Cori.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, OpenACC, OpenMP, Performance, Tesla V100

October 25, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)