high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Ricardo Nobre, Luis Reis, Joao M. P. Cardoso

Faculty of Engineering, University of Porto, INESC TEC, Porto, Portugal

arXiv:1810.10496 [cs.PF], (24 Oct 2018)

@article{nobre2018improving,

title={Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering},

author={Nobre, Ricardo and Reis, Luis and Cardoso, Joao M. P.},

year={2018},

month={oct},

archivePrefix={"arXiv"},

primaryClass={cs.PF}

}

Download (PDF)

View

Source

2289

views

Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative exploration to specialize LLVM phase orders on 15 OpenCL benchmarks to an NVIDIA GPU. We analyze the generated NVIDIA PTX code for the various versions to identify the main causes of the most significant improvements and present results of a set of experiments that demonstrate the importance of using specific phase orders. Using specialized compiler phase orders, we were able to achieve geometric mean improvements of 1.54x (up to 5.48x) and 1.65x (up to 5.7x) over PTX generated by the NVIDIA CUDA compiler from CUDA versions of the same kernels, and over execution of the OpenCL kernels compiled from source with the NVIDIA OpenCL driver, respectively. We also evaluate the use of code-features in the OpenCL kernels. More specifically, we evaluate an approach that achieves geometric mean improvements of 1.49x and 1.56x over the same OpenCL baseline, by using the compiler sequences of the 1 or 3 most similar benchmarks, respectively.

Tags: Computer science, nVidia, nVidia GeForce GTX 1070, OpenCL, Performance, PTX

October 28, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)