11666

Optimising OpenCL kernels for the ARM Mali-T600 GPUs

Johan Gronqvist, Anton Lokhmotov
ARM
Chapter in "GPU Pro 5" book, 2014

@article{gronqvist2014optimising,

   title={Optimising OpenCL kernels for the ARM R Mali TM-T600 GPUs},

   author={Gronqvist, Johan and Lokhmotov, Anton},

   year={2014}

}

Download Download (PDF)   View View   Source Source   

3022

views

OpenCL is a relatively young industry-backed standard API that aims to provide functional portability across systems equipped with computational accelerators such as GPUs: a standard-conforming OpenCL program can be executed on any standard-conforming OpenCL implementation. OpenCL, however, does not address the issue of performance portability: transforming an OpenCL program to achieve higher performance on one device may actually lead to lower performance on another device, since performance may depend significantly on low-level details, such as iteration space mapping and data layout [Howes et al. 10,Ryoo et al. 08]. Due to the popularity of certain GPU architectures, some optimisations have become hallmarks of GPU computing, e.g. coalescing global memory accesses or using local memory. Emerging mobile and embedded OpenCLcapable GPUs, however, have rather different organisation. Therefore, even seasoned GPU developers may need to forgo their instincts and learn new techniques when optimising for battery-powered GPU brethren. In this chapter, we introduce the ARM Mali-T600 GPU series (Section 1.2) and discuss performance characteristics of several versions of the Sobel edge detection filter (Section 1.3) and the general matrix multiplication (Section 1.4). We make no claim that the presented versions are the fastest possible implementations of the selected algorithms. Rather, we aim to provide an insight into which transformations should be considered when optimising kernel code for the Mali-T600 GPUs. Therefore, the described behaviour may differ from the actual behaviour for expository purposes. We perform our experiments on an Arndale development board2 powered by the Samsung Exynos 5250 chip. Exynos 5250 comprises a dual-core Cortex-A15 CPU at 1.7 GHz and a quad-core Mali-T604 GPU at 533 MHz. The OpenCL driver is of version 3.0 Beta.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: