high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Sabela Ramos, Torsten Hoefler

Scalable Parallel Computing Lab, Department of Computer Science, ETH Zurich

31st IEEE International Parallel & Distributed Processing Symposium (IPDPS’17), 2017

@article{ramo2017scapability,

title={Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL},

author={Ramos, Sabela and Hoefler, Torsten},

year={2017}

}

Download (PDF)

View

Source

2482

views

Increasingly complex memory systems and onchip interconnects are developed to mitigate the data movement bottlenecks in manycore processors. One example of such a complex system is the Xeon Phi KNL CPU with three different types of memory, fifteen memory configuration options, and a complex on-chip mesh network connecting up to 72 cores. Users require a detailed understanding of the performance characteristics of the different options to utilize the system efficiently. Unfortunately, peak performance is rarely achievable and achievable performance is hardly documented. We address this with capability models of the memory subsystem, derived by systematic measurements, to guide users to navigate the complex optimization space. As a case study, we provide an extensive model of all memory configuration options for Xeon Phi KNL. We demonstrate how our capability model can be used to automatically derive new close-to-optimal algorithms for various communication functions yielding improvements 5x and 24x over Intel’s tuned OpenMP and MPI implementations, respectively. Furthermore, we demonstrate how to use the models to assess how efficiently a bitonic sort application utilizes the memory resources. Interestingly, our capability models predict and explain that the high bandwidth MCDRAM does not improve the bitonic sort performance over DRAM.

Tags: Computer science, Intel Xeon Phi, Memory, MPI, OpenMP

April 20, 2017 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Your response

Recent source codes

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)