Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

hgpu.org » Applications » Computer science » Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Jie Li, George Michelogiannakis, Brandon Cook, Dulanya Cooray, Yong Chen

Texas Tech University, Lubbock, TX 79409, USA

arXiv:2301.05145 [cs.DC], (12 Jan 2023)

DOI:10.48550/arXiv.2301.05145

BibTeX

Download (PDF)

View

Source

825

views

The resource demands of HPC applications vary significantly. However, it is common for HPC systems to assign resources on a per-node basis to prevent interference from co-located workloads. This gap between the coarse-grained resource allocation and the varying resource demands can lead to underutilization of HPC resources. In this study, we comprehensively analyzed the resource usage and characteristics of NERSC Perlmutter, a state-of-the-art HPC system with both CPU-only and GPU-accelerated nodes. Our three-week usage analysis revealed that the majority of jobs had low CPU utilization and that around 86% of both CPU and GPU-enabled jobs used 50% or less of the available host memory. Additionally, 52.1% of GPU-enabled jobs used up to 25% of the GPU memory, and the memory capacity was over-provisioned in some ways for all jobs. The study also found that 60% of GPU-enabled jobs had idle GPUs, which could indicate that resource underutilization may occur as users adapt workflows to a system with new resources. Our research provides valuable insights into performance characterization and offers new perspectives for system operators to understand and track the migration of workloads. Furthermore, it can be extremely useful for designing, optimizing, and procuring HPC systems.

Tags: Computer science, HPC, nVidia, nVidia A100, Performance

January 22, 2023 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter

Share this:

Recent source codes

Most viewed papers (last 30 days)