14173

Composability of parallel codes on heterogeneous architectures

Andra-Ecaterina Hugo
Universite de Bordeaux
HAL: tel-01162975, (11 June 2015)

@article{hugo2015composability,

   title={Composability of parallel codes on heterogeneous architectures},

   author={Hugo, Andra},

   year={2015}

}

Download Download (PDF)   View View   Source Source   

1402

views

To face the ever demanding requirements in term of accuracy and speed of scientific simulations, the High Performance community is constantly increasing the demands in term of parallelism, adding thus tremendous value to parallel libraries strongly optimized for highly complex architectures.Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a uniform runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource over subscription, undesirable cache flushes or memory bus contention.In this thesis, we present an extension of StarPU, a runtime system specifically designed for heterogeneous architectures, that allows multiple parallel codes to run concurrently with minimal interference. Such parallel codes run within scheduling contexts that provide confined execution environments which can be used to partition computing resources. Scheduling contexts can be dynamically resized to optimize the allocation of computing resources among concurrently running libraries. We introduced a hypervisor that automatically expands or shrinks contexts using feedback from the runtime system (e.g. resource utilization). We demonstrated the relevance of this approach by extending an existing generic sparse direct solver (qr mumps) to use these mechanisms and introduced a new decomposition method based on proportional mapping that is used to build the scheduling contexts. In order to cope with the very irregular behavior of the application, the hypervisor manages dynamically the allocation of resources. By means of the scheduling contexts and the hypervisor we improved the locality and thus the overall performance of the solver.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: