29501

LLload: An Easy-to-Use HPC Utilization Tool

Chansup Byun, Albert Reuther, Julie Mullen, LaToya Anderson, William Arcand, Bill Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Guillermo Morales, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner
Massachusetts Institute of Technology
arXiv:2410.21036 [cs.PF]

@misc{byun2024llloadeasytousehpcutilization,

   title={LLload: An Easy-to-Use HPC Utilization Tool},

   author={Chansup Byun and Albert Reuther and Julie Mullen and LaToya Anderson and William Arcand and Bill Bergeron and David Bestor and Alexander Bonn and Daniel Burrill and Vijay Gadepally and Michael Houle and Matthew Hubbell and Hayden Jananthan and Michael Jones and Piotr Luszczek and Peter Michaleas and Lauren Milechin and Guillermo Morales and Andrew Prout and Antonio Rosa and Charles Yee and Jeremy Kepner},

   year={2024},

   eprint={2410.21036},

   archivePrefix={arXiv},

   primaryClass={cs.PF},

   url={https://arxiv.org/abs/2410.21036}

}

Download Download (PDF)   View View   Source Source   

536

views

The increasing use and cost of high performance computing (HPC) requires new easy-to-use tools to enable HPC users and HPC systems engineers to transparently understand the utilization of resources. The MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed a simple command, LLload, to monitor and characterize HPC workloads. LLload plays an important role in identifying opportunities for better utilization of compute resources. LLload can be used to monitor jobs both programmatically and interactively. LLload can characterize users’ jobs using various LLload options to achieve better efficiency. This information can be used to inform the user to optimize HPC workloads and improve both CPU and GPU utilization. This includes improvements using judicious oversubscription of the computing resources. Preliminary results suggest significant improvement in GPU utilization and overall throughput performance with GPU overloading in some cases. By enabling users to observe and fix incorrect job submission and/or inappropriate execution setups, LLload can increase the resource usage and improve the overall throughput performance. LLload is a light-weight, easy-to-use tool for both HPC users and HPC systems engineers to monitor HPC workloads to improve system utilization and efficiency.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: