Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

hgpu.org » Applications » Computer science » Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Towards autonomous resource management: Deep learning prediction of CPU-GPU load balancing

Inigo Gabirondo Lopez

Universidad de Zaragoza

Universidad de Zaragoza, 2024

BibTeX

Download (PDF)

View

Source

829

views

The demand of data centers has increased due to the latest improvements of Artificial Intelligence. These data centers are composed of thousands of servers with cooling systems that consume high amounts of energy. The servers usually contain several processing units that can cooperate for solving computational tasks. When making a proper partitioning of the entire workload among the processing units of the same machine, the total execution time and power consumption of the server is highly decreased. Hence, creating load balancing algorithms that create proper workload partitions can improve the energy efficiency. This work presents a deep learning based load balancer that is thought for CPU-GPU heterogeneous systems. The load balancer takes as input an OpenCL kernel, the work group size of the kernel, and the input size in bytes, and it outputs the amount of work to assign to the CPU. The load balancer leverages the heterogeneous device mapping model (Device mapping aims to select the best processing unit, only one, among several of them) presented in ProGraML. ProGraML exhibited a very poor performance in our setup, which made it impossible to do any kind of experiment. Hence, in order to solve this performance issue, we decided to migrate the full ProGraML project to the pytorch-geometric library. After that, we adapted the heterogeneous device mapping model for the load balancing task. Experimental results show that the load balancer is able to accurately predict workload partitions, even if the testing setup and the setup used for labelling the datasets differ. The final model has been able to predict 6 of the 8 tested kernels with a difference of less than 20% with respect to the theoretical work partition. Among those 6 kernels, 3 of them were predicted with an error less than 10%. In addition, our new ProGraML implementation removes the performance issue of the original work, it is publicly available and it is based on a library that enables performing new experiments more easily.

Tags: AMD Radeon HD 7970, Artificial intelligence, ATI, Computer science, Deep learning, Heterogeneous systems, load balancing, nVidia, nVidia GeForce GTX 970, OpenCL

February 10, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org