Compute units in OpenMP: Extensions for heterogeneous parallel programming

hgpu.org » Applications » Computer science » Compute units in OpenMP: Extensions for heterogeneous parallel programming

Compute units in OpenMP: Extensions for heterogeneous parallel programming

Marc Gonzàlez-Tallada, Enric Morancho

Computer Architecture Department, Universitat Politècnica de Catalunya–BarcelonaTech, Barcelona, Spain

Concurrency Computation: Practice and Experience, e7885, 2023

DOI:10.1002/cpe.7885

BibTeX

Download (PDF)

View

Source

1010

views

This article evaluates the current support for heterogeneous OpenMP 5.2 applications regarding the simultaneous activation of host and device computing units (e.g., CPUs, GPUs, or FPGAs). The article identifies limitations in the current OpenMP specification and describes the design and implementation of novel OpenMP extensions and runtime support for heterogeneous parallel programming. The Compute Unit (CUs) abstraction is introduced in the OpenMP programming model. The Compute Unit abstraction is defined in terms of an aggregation of computing elements (e.g., CPUs, GPUs, FPGAs). On top of CUs, the article describes dynamic work sharing constructs and schedulers that address the inherent differences in compute power of host and device CUs. New constructs and the corresponding runtime support are described for the new abstractions. The article evaluates the case of a hybrid multilevel parallelization of the NPB-MZ benchmark suite. The implementation exploits both coarse-grain and fine-grain parallelism, mapped to CUs of different nature (GPUs and CPUs). All CUs are activated using the new extensions and runtime support. We compare hybrid and nonhybrid executions under two state-of-the-art work-distribution schemes (Static and Dynamic Task schedulers). On a computing node composed of one AMD EPYC 7742 @ 2.250GHz (64 cores and 2 threads/core, totalling 128 threads per node) and 2x GPU AMD Radeon Instinct MI50 with 32GB, hybrid executions present speedups from 1.08x up to 3.18x with respect to a nonhybrid GPU implementation, depending on the number of activated CUs.

Tags: AMD Radeon Instinct Mi50, ATI, Benchmarking, Computer science, Heterogeneous systems, OpenMP, Task scheduling

August 28, 2023 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org