Evaluating performance and portability of OpenCL programs
Cyberscience Center, Tohoku University, Sendai Miyagi 980-8578, Japan
Fifth International Workshop on Automatic Performance Tuning, 2010
@inproceedings{komatsu2010iwapt,
author={Kazuhiko Komatsu and Katsuto Sato and Yusuke Arai and Kentaro Koyama and Hiroyuki Takizawa and Hiroaki Kobayashi},
title={Evaluating Performance and Portability of OpenCL Programs},
booktitle={The Fifth International Workshop on Automatic Performance Tuning},
year={2010},
month={June}
}
Recently, OpenCL, a new open programming standard for GPGPU programming, has become available in addition to CUDA. OpenCL can support various compute devices due to its higher abstraction programming framework. Since there is a semantic gap between OpenCL and compute devices, the OpenCL C compiler plays important roles to exploit the potential of compute devices and therefore its capability should be clarified. In this paper, the performance of CUDA and OpenCL programs is quantitatively evaluated. First, several CUDA and OpenCL programs of almost the same computations are developed, and their performances are compared. Then, the main factors causing their performance differences is investigated. The evaluation results suggest that the performances of OpenCL programs are comparable with those of CUDA ones if the kernel codes are appropriately optimized by hand or by the compiler optimizations. This paper also discusses the differences between NVIDIA and AMD OpenCL implementations by comparing the performances of their GPUs for the same programs. The performance comparison shows that the compiler options of the OpenCL C compiler and the execution configuration parameters have to be optimized for each GPU to obtain its best performance. Therefore, automatic parameter tuning is essential to enable a single OpenCL code to run efficiently on various GPUs.
October 12, 2011 by hgpu