Hierarchical DAG Scheduling for Hybrid Distributed Systems

hgpu.org » Programming » Algorithms » Hierarchical DAG Scheduling for Hybrid Distributed Systems

Hierarchical DAG Scheduling for Hybrid Distributed Systems

Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra

The University of Tennessee, Knoxville, USA

29th IEEE International Parallel & Distributed Processing Symposium, 2015

@inproceedings{wu:hal-01078359,

title={Hierarchical DAG Scheduling for Hybrid Distributed Systems},

author={Wu, Wei and Bouteiller, Aurelien and Bosilca, George and Faverge, Mathieu and Dongarra, Jack},

url={https://hal.inria.fr/hal-01078359},

booktitle={29th IEEE International Parallel & Distributed Processing Symposium},

address={Hyderabad, India},

year={2015},

month={May},

keywords={PaRSEC runtime ; GPU ; dense linear algebra; heterogeneous architecture},

hal_id={hal-01078359},

hal_version={v1}

}

Download (PDF)

View

Source

1338

views

Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak com-putational capacity. Despite significant advances in the pro-gramming interfaces to such hybrid architectures, traditional programming paradigms struggle mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate some of the programming challenges on distributed hybrid many-core architectures. In this paper we take this concept a step further by showing that the potential of task-based programming paradigms can be greatly increased with minimal modification of the underlying runtime combined with the right algorithmic changes. We propose two novel recursive algorithmic variants for one-sided factorizations and describe the changes to the PaRSEC task-scheduling runtime to build a framework where the task granularity is dynamically adjusted to adapt the degree of available parallelism and kernel effi-ciency according to runtime conditions. Based on an extensive set of results we show that, with one-sided factorizations, i.e. Cholesky and QR, a carefully written algorithm, supported by an adaptive tasks-based runtime, is capable of reaching a degree of performance and scalability never achieved before in distributed hybrid environments.

Tags: Algorithms, Computer science, CUDA, Factorization, Linear Algebra, nVidia, Tesla K40, Tesla M2090

January 5, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org