high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Managing Extreme Heterogeneity in Next Generation HPC Systems

Managing Extreme Heterogeneity in Next Generation HPC Systems

Harsh Khetawat

North Carolina State University

North Carolina State University, 2022

BibTeX

Download (PDF)

View

Source

1284

views

As traditional high performance computing architectures are unable to meet the energy and performance requirements of increasingly intensive applications, HPC centers are moving towards incorporating heterogeneous node architectures in next-generation HPC systems. While GPUs have become quite popular over the last few years as accelerators, other novel acceleration devices such as FPGAs and neural network processors are also gaining attention. Furthermore, heterogeneity is being incorporated in not just compute capabilities but also in the memory hierarchy with technologies such as HBM, NVRAM and PCM (e.g., Intel Optane); in the storage stack with the introduction of burst buffers, both node-local and distributed; and in the network interconnect with technologies such as GPUDirect and NVLink. This creates the need for a careful study of the compute, storage and network stack of HPC systems to extract the most performance from these increasingly heterogeneous node architectures. HPC applications are often composed as computational kernels, where each kernel has different computational characteristics, memory access patterns, communication patterns and accesses to storage devices for I/O. Suitability of different architectural features are therefore highly dependent on the application mix at an HPC center. Furthermore, application performance often depends on the application developer’s ability to utilize the resources available to them. To tackle this extreme heterogeneity that is emerging in HPC systems, we first create a simulation framework that allows HPC centers and application developers to study the optimal placement of storage resources in the context of an HPC interconnect topology. We study the storage performance of different applications with node-local and multiple distributed burst buffer placements in popular HPC network topologies. Next, we develop a simulation framework to study the networking performance of applications in systems that employ modern communication technologies like NVLink and GPUDirect. Our framework is intended to be used by application developers and HPC centers to determine the performance gains that can be achieved by leveraging these novel technologies. Finally, we address the heterogeneity in compute resources. We develop a framework for sharing the work of a single kernel amongst multiple accelerators as well as co-scheduling multiple applications on the same HPC node. We use four applications to study work sharing on a node with a CPU, a GPU and an FPGA. We also create workloads from these applications to assess the co-scheduling performance under four scheduling algorithms. This work shows that a holistic approach is required for the heterogeneity that is emerging in storage, interconnect and compute stacks in modern HPC systems across all components of the system in order to optimally use the variety of resources available in next-generation HPC.

Tags: Computer science, FPGA, Heterogeneous systems, Neural networks, nVidia, nVidia GeForce GTX 2060, OpenCL, Thesis

March 20, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Managing Extreme Heterogeneity in Next Generation HPC Systems

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Managing Extreme Heterogeneity in Next Generation HPC Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)