Optimizing OpenCL Local Work Group Size With Machine Learning
Computer Science, School of Informatics, University of Edinburgh
niversity of Edinburgh, 2014
@article{schlafli2014optimizing,
title={Optimizing OpenCL Local Work Group Size With Machine Learning},
author={Schlafli, Markus},
year={2014}
}
GPU architectures are becoming increasingly important due to their high number of processors. The single input multiple data architecture has proven to work not just for the graphics domain, but also for many other disciplines. This is due to the potential performance that can be achieved by a consumer-level GPU being significantly higher than the CPU counterpart. High level abstractions, such as OpenCL and CUDA, have been introduced to separate the hardware from the software, allowing programmers to develop for a variety of GPUs. However, the programmer needs highly-specialized knowledge of the underlying architecture in order to get the optimal performance of a system. An important part is splitting the workload up into manageable chunks, known as the work group size in OpenCL, that can be distributed across the cores of a GPU. However, due to memory access time and computational complexity of a GPU program, this task is often optimized on a trial-and-error basis. State of the art analytical models and tools can be used to choose the optimal work group size parameter with accuracy less than 30%. We propose a solution using predictive modelling from the domain of machine learning that can choose the optimal parameter of splitting the work between GPU cores a maximum of 86% of the time. OpenCL was the GPGPU framework that was used.
August 18, 2015 by hgpu