Managing Extreme Heterogeneity in Next Generation HPC Systems
North Carolina State University
North Carolina State University, 2022
@article{khetawat2022managing,
title={Managing Extreme Heterogeneity in Next Generation HPC Systems},
author={Khetawat, Harsh and others},
year={2022}
}
As traditional high performance computing architectures are unable to meet the energy and performance requirements of increasingly intensive applications, HPC centers are moving towards incorporating heterogeneous node architectures in next-generation HPC systems. While GPUs have become quite popular over the last few years as accelerators, other novel acceleration devices such as FPGAs and neural network processors are also gaining attention. Furthermore, heterogeneity is being incorporated in not just compute capabilities but also in the memory hierarchy with technologies such as HBM, NVRAM and PCM (e.g., Intel Optane); in the storage stack with the introduction of burst buffers, both node-local and distributed; and in the network interconnect with technologies such as GPUDirect and NVLink. This creates the need for a careful study of the compute, storage and network stack of HPC systems to extract the most performance from these increasingly heterogeneous node architectures. HPC applications are often composed as computational kernels, where each kernel has different computational characteristics, memory access patterns, communication patterns and accesses to storage devices for I/O. Suitability of different architectural features are therefore highly dependent on the application mix at an HPC center. Furthermore, application performance often depends on the application developer’s ability to utilize the resources available to them. To tackle this extreme heterogeneity that is emerging in HPC systems, we first create a simulation framework that allows HPC centers and application developers to study the optimal placement of storage resources in the context of an HPC interconnect topology. We study the storage performance of different applications with node-local and multiple distributed burst buffer placements in popular HPC network topologies. Next, we develop a simulation framework to study the networking performance of applications in systems that employ modern communication technologies like NVLink and GPUDirect. Our framework is intended to be used by application developers and HPC centers to determine the performance gains that can be achieved by leveraging these novel technologies. Finally, we address the heterogeneity in compute resources. We develop a framework for sharing the work of a single kernel amongst multiple accelerators as well as co-scheduling multiple applications on the same HPC node. We use four applications to study work sharing on a node with a CPU, a GPU and an FPGA. We also create workloads from these applications to assess the co-scheduling performance under four scheduling algorithms. This work shows that a holistic approach is required for the heterogeneity that is emerging in storage, interconnect and compute stacks in modern HPC systems across all components of the system in order to optimally use the variety of resources available in next-generation HPC.
March 20, 2022 by hgpu