29728

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Inigo Gabirondo Lopez
Universidad de Zaragoza
Universidad de Zaragoza, 2024

@article{gabirondotowards,

   title={Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing.},

   author={Gabirondo L{‘o}pez, I{~n}igo and Su{‘a}rez Gracia, Dar{‘i}o and Gran Tejero, Rub{‘e}n}

}

Download Download (PDF)   View View   Source Source   

607

views

The demand of data centers has increased due to the latest improvements of Artificial Intelligence. These data centers are composed of thousands of servers with cooling systems that consume high amounts of energy. The servers usually contain several processing units that can cooperate for solving computational tasks. When making a proper partitioning of the entire workload among the processing units of the same machine, the total execution time and power consumption of the server is highly decreased. Hence, creating load balancing algorithms that create proper workload partitions can improve the energy efficiency. This work presents a deep learning based load balancer that is thought for CPU-GPU heterogeneous systems. The load balancer takes as input an OpenCL kernel, the work group size of the kernel, and the input size in bytes, and it outputs the amount of work to assign to the CPU. The load balancer leverages the heterogeneous device mapping model (Device mapping aims to select the best processing unit, only one, among several of them) presented in ProGraML. ProGraML exhibited a very poor performance in our setup, which made it impossible to do any kind of experiment. Hence, in order to solve this performance issue, we decided to migrate the full ProGraML project to the pytorch-geometric library. After that, we adapted the heterogeneous device mapping model for the load balancing task. Experimental results show that the load balancer is able to accurately predict workload partitions, even if the testing setup and the setup used for labelling the datasets differ. The final model has been able to predict 6 of the 8 tested kernels with a difference of less than 20% with respect to the theoretical work partition. Among those 6 kernels, 3 of them were predicted with an error less than 10%. In addition, our new ProGraML implementation removes the performance issue of the original work, it is publicly available and it is based on a library that enables performing new experiments more easily.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: