high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing OpenCL Local Work Group Size With Machine Learning

Optimizing OpenCL Local Work Group Size With Machine Learning

Markus Schlafli

Computer Science, School of Informatics, University of Edinburgh

niversity of Edinburgh, 2014

@article{schlafli2014optimizing,

title={Optimizing OpenCL Local Work Group Size With Machine Learning},

author={Schlafli, Markus},

year={2014}

}

Download (PDF)

View

Source

2451

views

GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the CPU counterpart. High level abstractions, such as OpenCL and CUDA, have been introduced to separate the hardware from the software, allowing programmers to develop for a variety of GPUs. However, the programmer needs highly-specialized knowledge of the underlying architecture in order to get the optimal performance of a system. An important part is splitting the workload up into manageable chunks, known as the work group size in OpenCL, that can be distributed across the cores of a GPU. However, due to memory access time and computational complexity of a GPU program, this task is often optimized on a trial-and-error basis. State of the art analytical models and tools can be used to choose the optimal work group size parameter with accuracy less than 30%. We propose a solution using predictive modelling from the domain of machine learning that can choose the optimal parameter of splitting the work between GPU cores a maximum of 86% of the time. OpenCL was the GPGPU framework that was used.

Tags: ATI, ATI Radeon HD 7850, Computer science, Machine learning, OpenCL, Performance, PTX

August 18, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Optimizing OpenCL Local Work Group Size With Machine Learning

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Optimizing OpenCL Local Work Group Size With Machine Learning

Share this:

Recent source codes

Most viewed papers (last 30 days)