Evaluating the Performance and Portability of OpenCL

hgpu.org » Programming » Algorithms » Evaluating the Performance and Portability of OpenCL

Evaluating the Performance and Portability of OpenCL

Jarno van der Sanden

Electronic Systems Group, Faculty of Electrical Engineering, Eindhoven University of Technology

Eindhoven University of Technology, 2011

BibTeX

Download (PDF)

View

Source

2393

views

Recent developments in processor architecture have settled a shift from sequential processing to parallel processing. This shift was not based on a breakthrough in processor design, but was actually an alternative design trajectory to avoid the limits that were reached on single core development. Along with the shift towards parallel architectures, a gap arose between sequential programmers and parallel architectures. Several efforts from industry have tried to bridge the gap, resulting in parallel programming frameworks such as CUDA, OpenMP and the Cell SDK. A more recent parallel programming standard is OpenCL, as offered by the Khronos Group. OpenCL uniquely distinguishes itself by offering the programmer a single flexible programming framework, which can be used to target multiple platforms from different vendors. To what extent OpenCL is a suitable substitute for current programming standards is the main topic of interest in this thesis. It includes a detailed comparison and analysis of the performances of several image-processing algorithms implemented in both CUDA and OpenCL, and mapped onto an NVIDIA GPU. Despite the similarity of OpenCL and CUDA, performance differences up to 16% are observed. Furthermore, the suitability of OpenCL as a single standard for targeting multiple platforms is investigated by mapping and optimizing the image-processing algorithms to other architectures, including an AMD GPU and an Intel multi-core CPU. Cross-platform OpenCL mappings show that functional portability as well as performance portability cannot always be guaranteed. A method is proposed to improve the performance portability by developing a single OpenCL implementation that ports to multiple target devices, reaching at least 80% of the performances of the optimal implementations on the target devices.

Tags: Algorithms, ATI, ATI Radeon HD 5850, Computer science, CUDA, nVidia, nVidia GeForce GTX 470, OpenCL, Performance, Thesis

October 18, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org