Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs
Indiana University, Bloomington, IN 47405, USA
Proceedings of the Second International Workshop on GPUs and Scientific Applications (GPUScA), PACT 2011, 2011
@article{gunarathne2011optimizing,
title={Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs},
author={Gunarathne, T. and Salpitikorala, B. and Chauhan, A. and Fox, G.},
year={2011}
}
We present a study of three important kernels that occur frequently in iterative statistical applications: K-Means, Multi-Dimensional Scaling (MDS), and PageRank. We implemented each kernel using OpenCL and evaluated their performance on an NVIDIA Tesla GPGPU card. By examining the underlying algorithms and empirically measuring the performance of various components of the kernel we explored the optimization of these kernels by four main techniques: (1) caching invariant data in GPU memory across iterations, (2) selectively placing data in different memory levels, (3) rearranging data in memory, and (4) dividing the work between the GPU and the CPU. The optimizations resulted in performance improvements of up to 5X, compared to naive OpenCL implementations. We believe that these categories of optimizations are also applicable to other similar kernels. Finally, we draw several lessons that would be useful in not only implementing other similar kernels with OpenCL, but also in devising code generation strategies in compilers that target GPGPUs through OpenCL.
September 25, 2011 by hgpu