12285

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels

Jianbin Fang, Henk Sips, Pekka Jaaskelainen, Ana Lucia Varbanescu
Delft University of Technology
The 43rd International Conference on Parallel Processing (ICPP’14)

@inproceedings{fang2014grover,

   author={Jianbin Fang, Henk Sips, Pekka Jaaskelainen, Ana Lucia Varbanescu},

   title={Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels},

   booktitle={Proceedings of the 43rd International Conference on Parallel Processing (ICPP’14)},

   year={2014},

   month={September},

   location={Minneapolis, USA},

   url={http://www.pds.ewi.tudelft.nl/fileadmin/pds/homepages/fang/papers/icpp2k14a214.pdf},

   topic={Parallel Programming},

   group={PDS}

}

Download Download (PDF)   View View   Source Source   

1384

views

Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on a CPU. To address this unpredictability, we propose an empirical approach: by disabling the use of local memory in OpenCL kernels, we enable users to compare the kernel versions with and without local memory, and further choose the best performing version for a given platform.
To this end, we have designed Grover, a method to automatically remove local memory usage from OpenCL kernels. In particular, we create a correspondence between the global and local memory spaces, which is used to replace local memory accesses by global memory accesses. We have implemented this scheme in the LLVM framework as a compiling pass, which automatically transforms an OpenCL kernel with local memory to a version without it. We have validated Grover with 11 applications, and found that it can successfully disable local
memory usage for all of them. We have compared the kernels with and without local memory on three different processors, and found performance improvements for more than a third of the test cases after Grover disabled local memory usage. We conclude that such a compiler pass can be beneficial for performance, and, because it is fully automated, it can be used as an auto-tuning step for OpenCL kernels.
Rating: 2.5. From 3 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: