Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
Zuse Institute Berlin, Berlin, Germany
International Workshop on OpenCL (IWOCL’21), 2021
@inproceedings{baumann2021performance,
title={Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs},
author={Baumann, Tobias and Noack, Matthias and Steinke, Thomas},
booktitle={International Workshop on OpenCL},
pages={1–12},
year={2021}
}
The Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant performance drawbacks of PoCL on Intel CPUs – which run 92 % of the TOP500 list. Using a selection of benchmarks, we identify and analyse performance issues in PoCL with a focus on scheduling and vectorisation. We propose a new CPU device-driver based on Intel Threading Building Blocks (TBB), and evaluate LLVM with respect to automatic compiler vectorisation across work-items in PoCL. Using the TBB driver, it is possible to narrow the gap to Intel OpenCL and even outperform it by a factor of up to 1.3× in our proxy application benchmark with a manual vectorisation strategy.
May 9, 2021 by hgpu