Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures

hgpu.org » Programming » Algorithms » Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures

Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures

Raphael Bleuse, Thierry Gautier, Joao V. F. Lima, Gregory Mounie, Denis Trystram

Univ. Grenoble Alpes, France

arXiv:1402.6601 [cs.DC], (26 Feb 2014)

@article{2014arXiv1402.6601B,

author={Bleuse}, R. and {Gautier}, T. and {Lima}, J.~V.~F. and {Mouni{‘e}}, G. and {Trystram}, D.},

title={"{Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1402.6601},

primaryClass={"cs.DC"},

keywords={Computer Science – Distributed, Parallel, and Cluster Computing},

year={2014},

month={feb},

adsurl={http://adsabs.harvard.edu/abs/2014arXiv1402.6601B},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

Source codes

Package:

xkaapi 2.1

2065

views

Efficient implementations of parallel applications on heterogeneous hybrid architectures require a careful balance between computations and communications with accelerator devices. Even if most of the communication time can be overlapped by computations, it is essential to reduce the total volume of communicated data. The literature therefore abounds with ad-hoc methods to reach that balance, but that are architecture and application dependent. We propose here a generic mechanism to automatically optimize the scheduling between CPUs and GPUs, and compare two strategies within this mechanism: the classical Heterogeneous Earliest Finish Time (HEFT) algorithm and our new, parametrized, Distributed Affinity Dual Approximation algorithm (DADA), which consists in grouping the tasks by affinity before running a fast dual approximation. We ran experiments on a heterogeneous parallel machine with six CPU cores and eight NVIDIA Fermi GPUs. Three standard dense linear algebra kernels from the PLASMA library have been ported on top of the Xkaapi runtime. We report their performances. It results that HEFT and DADA perform well for various experimental conditions, but that DADA performs better for larger systems and number of GPUs, and, in most cases, generates much lower data transfers than HEFT to achieve the same performance.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, Package, Tesla C2050

February 28, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org