high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Porting to the Intel Xeon Phi: Opportunities and Challenges

Porting to the Intel Xeon Phi: Opportunities and Challenges

C. Rosales

Texas Advanced Computing Center, The University of Texas at Austin, J.J. Pickle Research Campus, Building 196, Austin, Texas

Extreme Scaling Workshop (XSCALE13), 2013

BibTeX

Download (PDF)

View

Source

2463

views

This work describes the challenges presented by porting code to the Intel Xeon Phi coprocessor, as well as opportunities for optimization and tuning. We use micro-benchmarks, code segments, assembly listings and application level results to illustrate the key issues in porting to the Xeon Phi coprocessor, always keeping in mind both portability and performance. While executing code on the Xeon Phi in native mode is fairly straightforward it can be a challenge to achieve good performance. The complexity of optimization increases as one introduces offload, distributed offload, or symmetric execution modes. We will initially focus on the fundamental issues that can prevent acceptable performance in native execution, and then address the key issues in data transfers due to either offloaded regions or MPI exchanges with the host CPU. Some of the issues are of a generic nature and affect any code using heterogeneous execution – PCIe bandwidth bottleneck -, and others are specific to the Xeon Phi and its software environment – Host/MIC MPI exchanges. We will also make an effort to indicate which issues are specific to this platform and which are of general applicability. In particular we will draw comparisons between the data management models in the Intel Xeon Phi and in the NVIDIA CUDA environment.

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, Intel Phi, MPI, nVidia, Optimization, Performance

September 15, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Porting to the Intel Xeon Phi: Opportunities and Challenges

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Porting to the Intel Xeon Phi: Opportunities and Challenges

Share this:

Recent source codes

Most viewed papers (last 30 days)