29705

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Cristian Campos, Rafael Asenjo, Angeles Navarro
Department of Computer Architecture, Universidad de Málaga, 29071 Málaga, Málaga, Spain
The Journal of Supercomputing, 81, 428, 2025

@article{campos2025exploring,

   title={Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+ GPU},

   author={Campos, Cristian and Asenjo, Rafael and Navarro, Angeles},

   journal={The Journal of Supercomputing},

   volume={81},

   number={2},

   pages={1–30},

   year={2025},

   publisher={Springer}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

408

views

In recent times, oneAPI has emerged as a competitive framework to optimize streaming applications on heterogeneous CPU+GPU architectures, since it provides portability and performance thanks to the SYCL programming language and efficient parallel libraries as oneTBB. However, this approach opens up a wealth of implementations alternatives in this type of applications: from how to design the data flow to how to exploit data parallelism. Choosing the best alternative is not trivial, so in this paper we analyze them and contribute with an analytical model based on queue theory that helps in the on-line selection of the alternative that maximizes the throughput and the occupancy of the CPU and GPU compute units. We explore the design space offered by: a) different APIs to define the data flow (parallel_pipeline and Flow Graph from oneTBB, and SYCL events from SYCL); b) alternative kernel implementations to express data parallelism (SYCL, AVX and std::simd); and c) the mapping of the kernels into the available computing resources (CPU cores and GPU). The results show that the std::simd library can be 1.54x faster, 3% more energy efficient, and requires 7.36x less programming effort than AVX, and that implementations that enable asynchronous offloading of tasks to the devices as those based on SYCL events and Flow Graph APIs outperform the other APIs, being up to 1.10x faster and up to 1.18x more energy efficient.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: