high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Benchmarking the Intel Xeon Phi Coprocessor

Benchmarking the Intel Xeon Phi Coprocessor

F. Masci

Infrared Processing and Analysis Center, Caltech

Infrared Processing and Analysis Center, Caltech, 2013

BibTeX

Download (PDF)

View

Source

2101

views

This document summarizes our first experience with the Intel Xeon Phi. This is a coprocessor that uses Intel’s Many Integrated Core (MIC) architecture to speed up highly parallel processes involving intensive numerical computations. The MIC coprocessor communicates with a regular Intel Xeon ("host") processor through its operating system. The Xeon Phi coprocessor is sometimes referred to as an "accelerator". In a nutshell, the Xeon Phi consists of 60 1.052 GHz cores each capable of executing four concurrent threads and delivering one teraflop of performance. For comparison, the host processor consists of 16 2.6 GHz cores, with two admissible threads per core. More details on the MIC hardware can be obtained from the references in Section 8. Rather than present yet another guide on how to efficiently program for the Xeon Phi, our goal is to explore whether (and when) there are advantages in using the Xeon Phi for the processing of astronomical data, e.g., as in a production pipeline. Such pipelines are common-use at the Infrared Processing and Analysis Center (IPAC) and range from the instrumental calibration of raw image data, astrometry, image co-addition, source extraction, photometry, and source association. We also outline some lessons learned to assist future developers. Note: the findings and opinions reported here are exclusively the author’s and do not reflect those of Intel or of any individual. IPAC has recently acquired a single Xeon Phi card for preliminary benchmarking. We find in general that all existing heritage software based on C/C++/Fortran code can be made to run natively on the Xeon Phi with no recoding. However, whether it will run optimally to fully exploit the MIC architecture is a different question entirely. The answer is usually no. Even software that has been extensively multithreaded to utilize a multicore processor isn’t guaranteed to run faster on the Xeon Phi than on a regular Intel Xeon machine. In fact, depending on memory and/or disk I/O usage, it can be much slower. The key is to make efficient use of the Xeon Phi MIC architecture. This is not designed to handle jobs that are memory (primarily RAM) intensive. It is designed to utilize wide vector instruction units for floating point arithmetic (see below for details). Therefore, the types of problems the Xeon Phi is well suited for are intensive numerical computations with a low memory bandwidth. Additionally, the computations need to use one of the highly optimized vector math libraries that were implemented using assembly language constructs tuned specifically for the Xeon Phi architecture. Knowing this programming model beforehand can assist a developer to design software such that segments with intense numerical calculations can be offloaded to the Xeon Phi to be accelerated. The host processor then does most of the data I/O and memory management.

Tags: Benchmarking, Computer science, Intel Xeon Phi, Performance

February 11, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Benchmarking the Intel Xeon Phi Coprocessor

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Benchmarking the Intel Xeon Phi Coprocessor

Share this:

Recent source codes

Most viewed papers (last 30 days)