high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators

On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators

Alessandro Capotondi, Andrea Marongiu

Universit’a di Bologna

Workshop on Software Development Environments for High-Performance Computing (DEHPC ’15), 2015

BibTeX

Download (PDF)

View

Source

1749

views

With the introduction of more powerful and massively parallel embedded processors, embedded systems are becoming HPC capable. In particular heterogeneous on-chip systems (SoC) that couple a general-purpose host processor to a many-core accelerator are becoming more and more widespread, and provide tremendous peak performance/watt, well suited to execute HPC-class programs. The increased computation potential is however traded off for ease programming. Application developers are indeed required to manually deal with outlining code parts suitable for acceleration, parallelize there efficiently over many available cores, and orchestrate data transfers to/from the accelerator. In addition, since most manycores are organized as a collection of clusters, featuring fast local communication but slow remote communication (i.e., to another cluster’s local memory), the programmer should also take care of properly mapping the parallel computation so as to avoid poor data locality. OpenMP v4.0 introduces new constructs for computation offloading, as well as directives to deploy parallel computation in a cluster-aware manner. In this paper we assess the effectiveness of OpenMP v4.0 at exploiting the massive parallelism available in embedded heterogeneous SoCs, comparing to standard parallel loops over several computation-intensive applications from the linear algebra and image processing domains.

Tags: Computer science, Heterogeneous systems, OpenMP, Performance, SoC

November 8, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators

Your response

Recent source codes

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Most viewed papers (last 30 days)

On the Effectiveness of OpenMP teams for Programming Embedded Manycore Accelerators

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)