Towards global composition of performance-aware components for GPU-based systems

Usman Dastgeer, Christoph Kessler
IDA, Linkoping University, 58183 Linkoping, Sweden
17th Int. Workshop on Compilers for Parallel Computers (CPC-2013), 2013


   title={Towards global composition of performance-aware components for GPU-based systems},

   author={Dastgeer, Usman and Kessler, Christoph},



Download Download (PDF)   View View   Source Source   



An important program optimization especially for heterogeneous parallel systems is performance-aware implementation selection which is (static or dynamic) selection between multiple implementation variants for the same computation, depending on the current execution context (such as currently available resources or performance affecting parameter values). Doing it for multiple component calls inside a program while considering interferences between call executions due to resource sharing and data flow is referred to as the global component composition problem. In this work, we study the HEFT (Heterogeneous Earliest Finish Time) greedy heuristic scheduler which considers one component call at a time and is used by many GPU-based runtime systems for performance-aware implementation selection. We discuss its effectiveness for component composition in programs containing more than one component call, on a GPU based system. Composition scenarios with both independent and data-dependent component calls where this heuristic might produce an overall sub-optimal schedule are shown. Furthermore, we describe four coordination constructs that can be used to model relationships between different component calls in a hierarchical manner and can be used for making better composition decisions. We discuss a component composition scenario with two or more component calls constrained inside a data dependency chain and propose a bulk scheduling heuristic that can make better decisions by considering data dependency between different component calls inside the chain. With our global composition framework supporting the program control and data flow analysis when injecting the composition code, we can implement such global heuristics in an automated way. Effectiveness of our bulk scheduling heuristic is evaluated using two examples on a GPU-based system.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: