Composability of parallel codes on heterogeneous architectures

hgpu.org » Applications » Computer science » Composability of parallel codes on heterogeneous architectures

Composability of parallel codes on heterogeneous architectures

Andra-Ecaterina Hugo

Universite de Bordeaux

HAL: tel-01162975, (11 June 2015)

@article{hugo2015composability,

title={Composability of parallel codes on heterogeneous architectures},

author={Hugo, Andra},

year={2015}

}

Download (PDF)

View

Source

2215

views

To face the ever demanding requirements in term of accuracy and speed of scientific simulations, the High Performance community is constantly increasing the demands in term of parallelism, adding thus tremendous value to parallel libraries strongly optimized for highly complex architectures.Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a uniform runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource over subscription, undesirable cache flushes or memory bus contention.In this thesis, we present an extension of StarPU, a runtime system specifically designed for heterogeneous architectures, that allows multiple parallel codes to run concurrently with minimal interference. Such parallel codes run within scheduling contexts that provide confined execution environments which can be used to partition computing resources. Scheduling contexts can be dynamically resized to optimize the allocation of computing resources among concurrently running libraries. We introduced a hypervisor that automatically expands or shrinks contexts using feedback from the runtime system (e.g. resource utilization). We demonstrated the relevance of this approach by extending an existing generic sparse direct solver (qr mumps) to use these mechanisms and introduced a new decomposition method based on proportional mapping that is used to build the scheduling contexts. In order to cope with the very irregular behavior of the application, the hypervisor manages dynamically the allocation of resources. By means of the scheduling contexts and the hypervisor we improved the locality and thus the overall performance of the solver.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, Sparse direct solvers, Tesla M2070, Thesis

June 26, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org