high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Run-time support for multi-level disjoint memory address spaces

Run-time support for multi-level disjoint memory address spaces

Javier Bueno Hedo

Departament d’Arquitectura de Computadors, Universitat Politecnica de Catalunya

Universitat Politecnica de Catalunya, 2015

@phdthesis{hedo2015run,

title={Run-time support for multi-level disjoint memory address spaces},

author={Hedo, Javier Bueno},

year={2015},

school={Universitat Polit{`e}cnica de Catalunya}

}

Download (PDF)

View

Source

1495

views

High Performance Computing (HPC) systems have become widely used tools in many industry areas and research fields. Research to produce more powerful and efficient systems has grown in par with their popularity. As a consequence, the complexity of modern HPC architectures has increased in order to provide systems with the highest levels of performance. This increased complexity has also affected the way HPC systems are programmed. HPC users have to deal with new devices, languages and tools, and this is can be a significant access barrier to people that do not have a deep knowledge in computer science. On par with the evolution of HPC systems, programming models have also evolved to ease the task of developing applications for these machines. Two well-known examples are OpenMP and MPI. The former can be used in shared memory systems and is praised for offering an easy methodology of software development. The latter is more popular because it targets distributed environments but it is considered burdensome to use. Besides these two, many programming models have emerged to propose new methodologies or to handle new hardware devices. One of these models is OmpSs. OmpSs is a programming model for modern HPC systems that is based on OpenMP and StarSs. Developed by the Programming Models group at the Barcelona Supercomputing Center, it targets the latest generation of HPC systems while benefiting from the ease of use of OpenMP. OmpSs offers asynchronous parallelism with the concept of tasks with data dependencies. These tasks allow the specification of sections of code that can be executed in parallel while the dependencies specify the restrictions about the order in which the tasks can be executed. With this, OmpSs programs can adapt to a many different system configurations while fundamentally still being sequential programs with annotations. This thesis explores the benefits of providing OmpSs the capability to target architectures with complex memory hierarchies. An example of such systems can be the new generation of clusters that use accelerators to power their computing capabilities. The memory hierarchy of these machines is composed of a first level of distributed memory formed by the memory of each individual node, and a second level formed by the private memory of each accelerator devices. We propose a reference implementation that enables OmpSs programs to run on a cluster with or without accelerators while also providing a competitive performance when compared with other programming models. We also discuss the enhancement of the OmpSs programming model with the support of non-contiguous regions of data. Offering this feature allows applications with complex data accesses to be easily annotated with OmpSs. This is important to widen the spectrum of applications that can be handled by the programming model. We present an implementation and evaluation of the performance and programmability impact of supporting non-contiguous memory regions.

Tags: Computer science, CUDA, MPI, nVidia, nVidia GeForce GTX 480, OmpSs, OpenMP, Thesis

December 15, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Run-time support for multi-level disjoint memory address spaces

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Run-time support for multi-level disjoint memory address spaces

Share this:

Recent source codes

Most viewed papers (last 30 days)