high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

Matthias Noack

Konrad-Zuse-Zentrum fur Informationstechnik Berlin

Konrad-Zuse-Zentrum fur Informationstechnik Berlin, 2014

BibTeX

Download (PDF)

View

Source

1564

views

The applicability of accelerators is limited by the attainable speed-up for the offloaded computations and by the offloading overheads. While GPU programming models like CUDA and OpenCL only allow to optimise the application code and its speed-up, the available low-level APIs for the Intel Xeon Phi provide opportunity to address the overheads, too. This work presents an Heterogeneous Active Message (HAM) layer that minimises software overheads for offloading on Intel’s Xeon Phi. It provides the basis for an offload API with similar semantics as the Intel Language Extensions for Offload (LEO). In contrast to LEO, HAM works within the C++ language and needs no additional compiler support. We evaluated HAM on top of SCIF and MPI as communication backends. While the SCIF backend offers the best performance, the MPI backend allows for inter-node offloads which are not possible with other offload solutions. Benchmark results show that the cost for offloading a function call can be decreased by a factor up to 18 compared with LEO.

Tags: Computer science, Heterogeneous systems, Intel Xeon Phi, MPI

June 17, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)