Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

hgpu.org » Applications » Computer science » Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

Chris J. Newburn, Rajiv Deodhar, Serguei Dmitriev, Ravi Murty, Ravi Narayanaswamy, John Wiegert, Francisco Chinchilla, Russell McGuire

Intel

Intel, 2013

BibTeX

Download (PDF)

View

Source

4093

views

The Intel Xeon Phi coprocessor platform has a software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-functional Intel Architecture CPU, namely, the Intel Xeon Phi coprocessor. The purpose of that offload is to improve response time and/or throughput. The attached paper presents the compiler offload software runtime infrastructure for the Intel Xeon Phi coprocessor, which includes a production C/C++ and Fortran compiler that enables offload to that coprocessor, and an underlying Intel Many Integrated Core (Intel MIC) platform software stack that enables offloading. The paper shares the insights that grow out of the experience of a multi-year, intensive development effort. It addresses end users’ questions about offload with the compiler offload runtime, namely, why offload to a coprocessor is useful, how it is specified, and what the conditions for the profitability of offload are. It also serves as a guide to potential third-party developers of offload runtimes, such as a gcc-based offload compiler, ports of existing commercial offloading compilers to Intel Xeon Phi coprocessor such as CAPS, and third-party offload library vendors that Intel is working with, such as NAG and MAGMA. It describes the software architecture and design of the offload compiler runtime. It enumerates the key performance features for this heterogeneous computing stack, related to initialization, data movement and invocation. Finally, it evaluates the performance impact of those features for a set of directed micro-benchmarks and larger workloads.

Tags: Compilers, Computer science, Heterogeneous systems, Intel, Intel Phi, Programming techniques

February 18, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org