9432

Composing multiple StarPU applications over heterogeneous machines: a supervised approach

A.-E Hugo A. Guermouche P.-A. Wacrenier R. Namyst
INRIA, LaBRI, University of Bordeaux, Talence, France
hal-00824514, (21 May 2013)
@inproceedings{hugo:hal-00824514,

   hal_id={hal-00824514},

   url={http://hal.inria.fr/hal-00824514},

   title={Composing multiple StarPU applications over heterogeneous machines: a supervised approach},

   author={Hugo, Andra-Ecaterina and Guermouche, Abdou and Namyst, Raymond and Wacrenier, Pierre-Andr{‘e}},

   language={Anglais},

   affiliation={RUNTIME – INRIA Bordeaux – Sud-Ouest , Laboratoire Bordelais de Recherche en Informatique – LaBRI , HiePACS – INRIA Bordeaux – Sud-Ouest},

   booktitle={Third International Workshop on Accelerators and Hybrid Exascale Systems},

   address={Boston, {‘E}tats-Unis},

   audience={internationale},

   year={2013},

   month={May},

   pdf={http://hal.inria.fr/hal-00824514/PDF/PID2692011.pdf}

}

Download Download (PDF)   View View   Source Source   

317

views

Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a single runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention. This paper presents an extension of StarPU, a runtime system specifically designed for heterogeneous architectures, that allows multiple parallel codes to run concurrently with minimal interference. Such parallel codes run within scheduling contexts that provide confined execution environments which can be used to partition computing resources. Scheduling contexts can be dynamically resized to optimize the allocation of computing resources among concurrently running libraries. We introduce a hypervisor that automatically expands or shrinks contexts using feedback from the runtime system (e.g. resource utilization). We demonstrate the relevance of our approach using benchmarks invoking multiple high performance linear algebra kernels simultaneously on top of heterogeneous multicore machines. We show that our mechanism can dramatically improve the overall application run time (-34%), most notably by reducing the average cache miss ratio (-50%).
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Like us on Facebook

HGPU group

124 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1180 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: