Enabling task-level scheduling on heterogeneous platforms

Enqiang Sun, Dana Schaa, Richard Bagley, Norman Rubin, David Kaeli
Department of Electrical and Computer Engineering, Northeastern University, Boston MA, USA
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5), 2012


   title={Enabling task-level scheduling on heterogeneous platforms},

   author={Sun, E. and Schaa, D. and Bagley, R. and Rubin, N. and Kaeli, D.},

   booktitle={Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units},





Download Download (PDF)   View View   Source Source   



OpenCL is an industry standard for parallel programming on heterogeneous devices. With OpenCL, compute-intensive portions of an application can be offloaded to a variety of processing units within a system. OpenCL is the first standard that focuses on portability, allowing programs to be written once and run seamlessly on multiple, heterogeneous devices, regardless of vendor. While OpenCL has been widely adopted, there still remains a lack of support for automatic task scheduling and data consistency when multiple devices appear in the system. To address this need, we have designed a task queueing extension for OpenCL that provides a high-level, unified execution model tightly coupled with a resource management facility. The main motivation for developing this extension is to provide OpenCL programmers with a convenient programming paradigm to fully utilize all possible devices in a system and incorporate flexible scheduling schemes. To demonstrate the value and utility of this extension, we have utilized an advanced OpenCL-based imaging toolkit called clSURF. Using our task queueing extension, we demonstrate the potential performance opportunities and limitations given current vendor implementations of OpenCL. Using a state-of-art implementation on a single GPU device as the baseline, our task queueing extension achieves a speedup up to 72.4%. Our extension also achieves scalable performance gains on multiple heterogeneous GPU devices. The performance trade-offs of using the host CPU as an accelerator are also evaluated.
No votes yet.
Please wait...

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: