high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Benchmarking Intel Xeon Phi to Guide Kernel Design

Benchmarking Intel Xeon Phi to Guide Kernel Design

Jianbin Fang, Ana Lucia Varbanescu, Henk Sips, Lilun Zhang, Yonggang Che, Chuanfu Xu

Delft University of Technology

Delft University of Technology, PDS Technical Report PDS-2013-005, 2013

@techreport{fang2013benchmarking,

title={Benchmarking Intel Xeon Phi to Guide Kernel Design},

author={Fang, Jianbin and Varbanescu, Ana Lucia and Sips, Henk and Zhang, Lilun and Che, Yonggang and Xu, Chuanfu},

year={2013}

}

Download (PDF)

View

Source

3117

views

With a minimum of 50 cores, Intel’s Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two levels of caches, and a very fast interconnection, the Xeon Phi is able to achieve theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used as both coprocessor or a stand-alone processor – are very tempting for parallel applications looking for new performance records. In this paper, we present four hardware-centric guidelines and a machine model for Xeon Phi programmers in search for performance. Specifically, we have benchmarked the main hardware components of the processor – the cores, the memory hierarchies, and the ring interconnect. We show that, in ideal microbenchmarking conditions, the achieved performance is very close to the theoretical one as given in the official programmer’s guide. Furthermore, we have identified and quantified several causes for significant performance penalties, which are not available in the official documentation. Based on this information, we synthesized four optimization guidelines and applied them to a set of kernels, aiming to systematically optimize their performance. The optimization process is guided by performance roofs, derived from the same benchmarks. Our experimental results show that, using this strategy, we can achieve impressive performance gains and, more importantly, a high utilization of the processor.

Tags: Benchmarking, Computer science, Intel, Intel Phi, Performance

July 14, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Benchmarking Intel Xeon Phi to Guide Kernel Design

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Benchmarking Intel Xeon Phi to Guide Kernel Design

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)