high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » User-Driven Online Kernel Fusion for SYCL

User-Driven Online Kernel Fusion for SYCL

Victor Perez, Lukas Sommer, Victor Lomüller, Kumudha Narasimhan, Mehdi Goli

Codeplay Software Ltd., UK

ACM Transactions on Architecture and Code Optimization, 2022

DOI:10.1145/3571284

BibTeX

Download (PDF)

View

Source

Source codes

Package:

SYCL-DNN: a library implementing neural network algorithms written using SYCL

1095

views

Heterogeneous programming models are becoming increasingly popular to support the ever-evolving hardware architectures, especially for new and emerging specialized accelerators optimizing speciic tasks. While such programs provide performance portability of the existing applications across various heterogeneous architectures to some extent, short-running device kernels can affect an application performance due to overheads of data transfer, synchronization and kernel launch. While in applications with one or two short-running kernels the overhead can be negligible, it can be noticeable when these short-running kernels dominate the overall number of kernels in an application, as it is the case in graph-based neural network models, where there are several small memory-bound nodes alongside few large compute-bound nodes. To reduce the overhead, combining several kernels into a single, more optimized kernel is an active area of research. However, this task can be time-consuming and error-prone given the huge set of potential combinations. This can push programmers to seek a trade-of between (a) task-speciic kernels with low overhead but hard to maintain and (b) smaller modular kernels with higher overhead but easier to maintain. While there are DSL-based approaches, such as those provided for machine learning frameworks, which ofer the possibility of such a fusion, they are limited to a particular domain and exploit speciic knowledge of that domain and, as a consequence, are hard to port elsewhere. This study explores the feasibility of a user-driven kernel fusion through an extension to the SYCL API to address the automation of kernel fusion. The proposed solution requires programmers to deine the subgraph regions that are potentially suitable for fusion without any modification to the kernel code or the function signature. We evaluate the performance beneit of our approach on common neural networks and study the performance improvement in detail.

Tags: Compilers, Computer science, Deep learning, OpenCL, Package, SYCL

November 27, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

User-Driven Online Kernel Fusion for SYCL

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

User-Driven Online Kernel Fusion for SYCL

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)