high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance portability through machine learning guided kernel selection in SYCL libraries

Performance portability through machine learning guided kernel selection in SYCL libraries

John Lawson

Codeplay Software Ltd.

arXiv:2008.13145 [cs.PF], (30 Aug 2020)

@misc{lawson2020performance,

title={Performance portability through machine learning guided kernel selection in SYCL libraries},

author={John Lawson},

year={2020},

eprint={2008.13145},

archivePrefix={arXiv},

primaryClass={cs.PF}

}

Download (PDF)

View

Source

Source codes

Package:

Towards automated kernel selection in machine learning systems: Supplementary code and dataset

2081

views

Automatically tuning parallel compute kernels allows libraries and frameworks to achieve performance on a wide range of hardware, however these techniques are typically focused on finding optimal kernel parameters for particular input sizes and parameters. General purpose compute libraries must be able to cater to all inputs and parameters provided by a user, and so these techniques are of limited use. Additionally, parallel programming frameworks such as SYCL require that the kernels be deployed in a binary format embedded within the library. As such it is impractical to deploy a large number of possible kernel configurations without inflating the library size. Machine learning methods can be used to mitigate against both of these problems and provide performance for general purpose routines with a limited number of kernel configurations. We show that unsupervised clustering methods can be used to select a subset of the possible kernels that should be deployed and that simple classification methods can be trained to select from these kernels at runtime to give good performance. As these techniques are fully automated, relying only on benchmark data, the tuning process for new hardware or problems does not require any developer effort or expertise.

Tags: AMD R9 Nano, ATI, Auto-Tuning, Benchmarking, Clustering, Computer science, Machine learning, Package, Performance, performance portability, SYCL

September 6, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Performance portability through machine learning guided kernel selection in SYCL libraries

Package:

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

Performance portability through machine learning guided kernel selection in SYCL libraries

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)