11354

Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

Angeles Navarro, Antonio Vilches, Francisco Corbera, Rafael Asenjo
Dept. of Computer Architecture, University of Malaga, Spain
Technical Report. Dept. Comp. Architecture. Univ. of Malaga, 2013

@article{navarro2013strategies,

   title={Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures},

   author={Navarro, Angeles and Vilches, Antonio and Corbera, Francisco and Asenjo, Rafael},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

2196

views

This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel for template to allow its exploitation on heterogeneous architectures. Previous task frameworks that offer support for heterogeneous systems implement a variety of static and dynamic scheduling strategies, although the size of the chunk of iterations assigned to each device is always fixed. However, due to the asymmetry of the computing resources we propose in this work a dynamic scheduling strategy coupled with an adaptive partitioning scheme that resizes chunks to prevent underutilization and load unbalance of CPUs and GPUs. In this paper we also address the problem of the underutilization of the CPU core where a host thread operates. To solve it, we propose two different approaches: i) a collaborative host thread strategy, in which the host thread, instead of busy-waiting for the GPU to complete, it carries out useful chunk processing. To implement this strategy, we modify our partitioning scheme to provide a chunk to the host thread each time that a GPU device gets new work; and ii) a host thread blocking strategy combined with oversubscription, that delegates on the OS the duty of scheduling threads to available CPU cores in order to guarantee that all cores are doing useful work. Using two benchmarks we evaluate the overhead introduced by our scheduling and partitioning algorithms, finding that it is negligible. We also evaluate the efficiency of the strategies proposed finding that allowing oversubscription controlled by the OS can be beneficial under certain scenarios.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: