high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing OpenCL Local Work Group Size With Machine Learning

Optimizing OpenCL Local Work Group Size With Machine Learning

Markus Schlafli

Computer Science, School of Informatics, University of Edinburgh

niversity of Edinburgh, 2014

@article{schlafli2014optimizing,

title={Optimizing OpenCL Local Work Group Size With Machine Learning},

author={Schlafli, Markus},

year={2014}

}

Download (PDF)

View

Source

2911

views

GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the CPU counterpart. High level abstractions, such as OpenCL and CUDA, have been introduced to separate the hardware from the software, allowing programmers to develop for a variety of GPUs. However, the programmer needs highly-specialized knowledge of the underlying architecture in order to get the optimal performance of a system. An important part is splitting the workload up into manageable chunks, known as the work group size in OpenCL, that can be distributed across the cores of a GPU. However, due to memory access time and computational complexity of a GPU program, this task is often optimized on a trial-and-error basis. State of the art analytical models and tools can be used to choose the optimal work group size parameter with accuracy less than 30%. We propose a solution using predictive modelling from the domain of machine learning that can choose the optimal parameter of splitting the work between GPU cores a maximum of 86% of the time. OpenCL was the GPGPU framework that was used.

Tags: ATI, ATI Radeon HD 7850, Computer science, Machine learning, OpenCL, Performance, PTX

August 18, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Optimizing OpenCL Local Work Group Size With Machine Learning

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Optimizing OpenCL Local Work Group Size With Machine Learning

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)