high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Sabela Ramos, Torsten Hoefler

Scalable Parallel Computing Lab, Department of Computer Science, ETH Zurich

31st IEEE International Parallel & Distributed Processing Symposium (IPDPS’17), 2017

@article{ramo2017scapability,

title={Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL},

author={Ramos, Sabela and Hoefler, Torsten},

year={2017}

}

Download (PDF)

View

Source

1773

views

Increasingly complex memory systems and onchip interconnects are developed to mitigate the data movement bottlenecks in manycore processors. One example of such a complex system is the Xeon Phi KNL CPU with three different types of memory, fifteen memory configuration options, and a complex on-chip mesh network connecting up to 72 cores. Users require a detailed understanding of the performance characteristics of the different options to utilize the system efficiently. Unfortunately, peak performance is rarely achievable and achievable performance is hardly documented. We address this with capability models of the memory subsystem, derived by systematic measurements, to guide users to navigate the complex optimization space. As a case study, we provide an extensive model of all memory configuration options for Xeon Phi KNL. We demonstrate how our capability model can be used to automatically derive new close-to-optimal algorithms for various communication functions yielding improvements 5x and 24x over Intel’s tuned OpenMP and MPI implementations, respectively. Furthermore, we demonstrate how to use the models to assess how efficiently a bitonic sort application utilizes the memory resources. Interestingly, our capability models predict and explain that the high bandwidth MCDRAM does not improve the bitonic sort performance over DRAM.

Tags: Computer science, Intel Xeon Phi, Memory, MPI, OpenMP

April 20, 2017 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Share this:

Recent source codes

Most viewed papers (last 30 days)