high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Porting to the Intel Xeon Phi: Opportunities and Challenges

Porting to the Intel Xeon Phi: Opportunities and Challenges

C. Rosales

Texas Advanced Computing Center, The University of Texas at Austin, J.J. Pickle Research Campus, Building 196, Austin, Texas

Extreme Scaling Workshop (XSCALE13), 2013

@article{rosales2013porting,

title={Porting to the Intel Xeon Phi: Opportunities and Challenges},

author={Rosales, C},

year={2013}

}

Download (PDF)

View

Source

2763

views

This work describes the challenges presented by porting code to the Intel Xeon Phi coprocessor, as well as opportunities for optimization and tuning. We use micro-benchmarks, code segments, assembly listings and application level results to illustrate the key issues in porting to the Xeon Phi coprocessor, always keeping in mind both portability and performance. While executing code on the Xeon Phi in native mode is fairly straightforward it can be a challenge to achieve good performance. The complexity of optimization increases as one introduces offload, distributed offload, or symmetric execution modes. We will initially focus on the fundamental issues that can prevent acceptable performance in native execution, and then address the key issues in data transfers due to either offloaded regions or MPI exchanges with the host CPU. Some of the issues are of a generic nature and affect any code using heterogeneous execution – PCIe bandwidth bottleneck -, and others are specific to the Xeon Phi and its software environment – Host/MIC MPI exchanges. We will also make an effort to indicate which issues are specific to this platform and which are of general applicability. In particular we will draw comparisons between the data management models in the Intel Xeon Phi and in the NVIDIA CUDA environment.

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, Intel Phi, MPI, nVidia, Optimization, Performance

September 15, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Porting to the Intel Xeon Phi: Opportunities and Challenges

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Porting to the Intel Xeon Phi: Opportunities and Challenges

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)