Run-time support for multi-level disjoint memory address spaces
Departament d’Arquitectura de Computadors, Universitat Politecnica de Catalunya
Universitat Politecnica de Catalunya, 2015
@phdthesis{hedo2015run,
title={Run-time support for multi-level disjoint memory address spaces},
author={Hedo, Javier Bueno},
year={2015},
school={Universitat Polit{`e}cnica de Catalunya}
}
High Performance Computing (HPC) systems have become widely used tools in many industry areas and research fields. Research to produce more powerful and efficient systems has grown in par with their popularity. As a consequence, the complexity of modern HPC architectures has increased in order to provide systems with the highest levels of performance. This increased complexity has also affected the way HPC systems are programmed. HPC users have to deal with new devices, languages and tools, and this is can be a significant access barrier to people that do not have a deep knowledge in computer science. On par with the evolution of HPC systems, programming models have also evolved to ease the task of developing applications for these machines. Two well-known examples are OpenMP and MPI. The former can be used in shared memory systems and is praised for offering an easy methodology of software development. The latter is more popular because it targets distributed environments but it is considered burdensome to use. Besides these two, many programming models have emerged to propose new methodologies or to handle new hardware devices. One of these models is OmpSs. OmpSs is a programming model for modern HPC systems that is based on OpenMP and StarSs. Developed by the Programming Models group at the Barcelona Supercomputing Center, it targets the latest generation of HPC systems while benefiting from the ease of use of OpenMP. OmpSs offers asynchronous parallelism with the concept of tasks with data dependencies. These tasks allow the specification of sections of code that can be executed in parallel while the dependencies specify the restrictions about the order in which the tasks can be executed. With this, OmpSs programs can adapt to a many different system configurations while fundamentally still being sequential programs with annotations. This thesis explores the benefits of providing OmpSs the capability to target architectures with complex memory hierarchies. An example of such systems can be the new generation of clusters that use accelerators to power their computing capabilities. The memory hierarchy of these machines is composed of a first level of distributed memory formed by the memory of each individual node, and a second level formed by the private memory of each accelerator devices. We propose a reference implementation that enables OmpSs programs to run on a cluster with or without accelerators while also providing a competitive performance when compared with other programming models. We also discuss the enhancement of the OmpSs programming model with the support of non-contiguous regions of data. Offering this feature allows applications with complex data accesses to be easily annotated with OmpSs. This is important to widen the spectrum of applications that can be handled by the programming model. We present an implementation and evaluation of the performance and programmability impact of supporting non-contiguous memory regions.
December 15, 2015 by hgpu