Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

hgpu.org » Programming » Algorithms » Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

Angeles Navarro, Antonio Vilches, Francisco Corbera, Rafael Asenjo

Dept. of Computer Architecture, University of Malaga, Spain

Technical Report. Dept. Comp. Architecture. Univ. of Malaga, 2013

@article{navarro2013strategies,

title={Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures},

author={Navarro, Angeles and Vilches, Antonio and Corbera, Francisco and Asenjo, Rafael},

year={2013}

}

Download (PDF)

View

Source

2094

views

This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel for template to allow its exploitation on heterogeneous architectures. Previous task frameworks that offer support for heterogeneous systems implement a variety of static and dynamic scheduling strategies, although the size of the chunk of iterations assigned to each device is always fixed. However, due to the asymmetry of the computing resources we propose in this work a dynamic scheduling strategy coupled with an adaptive partitioning scheme that resizes chunks to prevent underutilization and load unbalance of CPUs and GPUs. In this paper we also address the problem of the underutilization of the CPU core where a host thread operates. To solve it, we propose two different approaches: i) a collaborative host thread strategy, in which the host thread, instead of busy-waiting for the GPU to complete, it carries out useful chunk processing. To implement this strategy, we modify our partitioning scheme to provide a chunk to the host thread each time that a GPU device gets new work; and ii) a host thread blocking strategy combined with oversubscription, that delegates on the OS the duty of scheduling threads to available CPU cores in order to guarantee that all cores are doing useful work. Using two benchmarks we evaluate the overhead introduced by our scheduling and partitioning algorithms, finding that it is negligible. We also evaluate the efficiency of the strategies proposed finding that allowing oversubscription controlled by the OS can be beneficial under certain scenarios.

Tags: Algorithms, Computer science, CUDA, Heterogeneous systems, nVidia, Performance, Tesla S2050

February 4, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org