Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

Chris J. Newburn, Rajiv Deodhar, Serguei Dmitriev, Ravi Murty, Ravi Narayanaswamy, John Wiegert, Francisco Chinchilla, Russell McGuire
Intel, 2013

   title={Offload Compiler Runtime for the Intel{textregistered} Xeon Phi},

   author={Newburn, Chris J and Deodhar, Rajiv and Dmitriev, Serguei and Murty, Ravi and Narayanaswamy, Ravi and Wiegert, John and Chinchilla, Francisco and McGuire, Russell},



Download Download (PDF)   View View   Source Source   



The Intel Xeon Phi coprocessor platform has a software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-functional Intel Architecture CPU, namely, the Intel Xeon Phi coprocessor. The purpose of that offload is to improve response time and/or throughput. The attached paper presents the compiler offload software runtime infrastructure for the Intel Xeon Phi coprocessor, which includes a production C/C++ and Fortran compiler that enables offload to that coprocessor, and an underlying Intel Many Integrated Core (Intel MIC) platform software stack that enables offloading. The paper shares the insights that grow out of the experience of a multi-year, intensive development effort. It addresses end users’ questions about offload with the compiler offload runtime, namely, why offload to a coprocessor is useful, how it is specified, and what the conditions for the profitability of offload are. It also serves as a guide to potential third-party developers of offload runtimes, such as a gcc-based offload compiler, ports of existing commercial offloading compilers to Intel Xeon Phi coprocessor such as CAPS, and third-party offload library vendors that Intel is working with, such as NAG and MAGMA. It describes the software architecture and design of the offload compiler runtime. It enumerates the key performance features for this heterogeneous computing stack, related to initialization, data movement and invocation. Finally, it evaluates the performance impact of those features for a set of directed micro-benchmarks and larger workloads.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1662 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

337 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: