high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » On Runtime Systems for Task-based Programming on Heterogeneous Platforms

On Runtime Systems for Task-based Programming on Heterogeneous Platforms

Samuel Thibault

LaBRI – Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux – Sud-Ouest

tel-01959127, 18 December 2018

BibTeX

Download (PDF)

View

Source

2328

views

Simulation has become pervasive in science. Real experimentation remains an essential step in scientific research, but simulation replaced a wide range of costly and lengthy or even dangerous experimentation. It however requires massive computation power, and scientists will always welcome bigger and faster computation platforms, to be able to keep simulating more and more accurately and extensively. The HPC field has kept providing such platforms but with various shifts along the decades, from vector computers to clusters. It seems that the past decade has seen such a shift, as shown by the top500 list of the fastest supercomputers. To be able to stay in the race, most of the largest platforms include accelerators such as GPGPUs or Xeon Phi, making them heterogeneous systems. Programming such systems is significantly more complex than programming the homogeneous platforms we were used to, as it now requires orchestrating asynchronous accelerator operations along usual computation execution and communications over the network, to the point that it does not seem reasonable to optimize execution by hand any more. A deep trend which has emerged to cope with this new complexity is using task-based programming models. These are not new, but have really regained a lot of interest lately, showing up in a large variety of industrial platforms and research projects using this model with various programming interfaces and features. A key part here, that is however often forgotten, misunderstood, or just ignored, is the underlying runtime system which manages tasks. This is nonetheless where an extremely wide range of optimization and support can be provided, thanks to a task-based programming model. As we will see in this document, this model is indeed very appealing for runtime systems: since they get to know the set of tasks which will have to be executed, which data they will access, possibly an estimation of the duration of the tasks, etc., this opens up for extensive possibilities, which are not reachable by an Operating System with the current limited system interfaces. It is actually more and more heard in conference keynotes that runtime systems will be key for HPC’s continued success. Thanks to such rich information from applications, runtime systems can bring a lot of questions on the desk, in terms of task scheduling of course, but also transfer optimization, memory management, performance feedback, etc. When the PhD thesis of Cedric Augonnet started in 2008, we started addressing a few of these questions within a runtime system, StarPU. During the decade that followed, we have deepened the investigations, and opened new directions, which I will discuss in this document. We have chosen to keep focused on the runtime aspects, leaving a bit on the side for instance programming languages which can be introduced to make task-based programming easier. The runtime part itself indeed did keep providing various challenges which show up in task-based runtime systems in general. These challenges happen to be related to various other research topics, and collaboration with the respective research teams has then only become natural: task scheduling of course, but also network communication, statistics, performance visualization, etc. Conversely, the existence of the actual working runtime system StarPU provided them with interesting test-cases for their respective approaches. It also allowed for various research projects to be conducted around it, without direct contribution to StarPU.

Tags: Computer science, CUDA, Distributed computing, Heterogeneous systems, nVidia, nVidia Quadro FX 5800, OpenCL, Operating systems, StarPU, Task scheduling, Tesla C2050, Tesla K20, Tesla M2075, Thesis

December 23, 2018 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

On Runtime Systems for Task-based Programming on Heterogeneous Platforms

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

On Runtime Systems for Task-based Programming on Heterogeneous Platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)