high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Pietro Ghiglio, Uwe Dolinsky, Mehdi Goli, Kumudha Narasimhan

Codeplay Software Ltd., Edinburgh, Scotland, UK

Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM ’22), 2022

DOI:10.1145/3528425.3529099

@inproceedings{ghiglio2022improving,

title={Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow},

author={Ghiglio, Pietro and Dolinsky, Uwe and Goli, Mehdi and Narasimhan, Kumudha},

booktitle={Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores},

pages={1–10},

year={2022}

}

Download (PDF)

View

Source

1605

views

The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of different platforms. Existing SYCL implementations provide support for various devices like CPUs, GPUs, DSPs, FPGAs, etc, typically via OpenCL or CUDA backends. While accelerators have increased the performance of user applications significantly, employing CPU devices for further performance improvement is beneficial due to the significant presence of CPUs in existing datacenters. SYCL applications on CPUs, currently go through an OpenCL backend. Though an OpenCL backend is valuable in supporting accelerators, it may introduce additional overhead for CPUs since the host and device are the same. Overheads like a run-time compilation of the kernel, transferring of input/output memory to/from the OpenCL device, invoking the OpenCL kernel, may not be necessary when running on the CPU. While some of these overheads (such as data transfer) can be avoided by modifying the application, it can introduce disparity in the SYCL application’s ability to achieve performance portability on other devices. In this paper, we propose an alternate approach to running SYCL applications on CPUs. We bypass OpenCL and use a CPU-directed compilation flow, along with the integration of Whole Function Vectorization to generate optimized host and device code together in the same translation unit. We compare the performance of our approach – the CPU-directed compilation flow, with an OpenCL backend for existing SYCL-based applications, with no code modification. We run experiments across various CPU architectures to attest to the efficacy of our proposed approach.

Tags: Computer science, CUDA, nVidia, OpenCL, Performance, performance portability, SYCL

May 1, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)