A multi-agent architecture for scheduling of high performance services in a GPU cluster
CONACYT-Centro de Investigacion en Matematicas
International Journal of Combinatorial Optimization Problems and Informatics, Vol 9, No 1, 2018
@article{trejo2018multi,
title={A multi-agent architecture for scheduling of high performance services in a GPU cluster},
author={Trejo-S{‘a}nchez, Joel Antonio and L{‘o}pez-Mart{‘i}nez, Jos{‘e} Luis and Garc{‘i}a, Jos{‘e} Octavio Guti{‘e}rrez and Ram{‘i}rez-Pacheco, Julio C{‘e}sar and Fajardo-Delgado, Daniel},
journal={International Journal of Combinatorial Optimization Problems and Informatics},
volume={9},
number={1},
pages={12–22},
year={2018}
}
Nowadays, clusters containing multiple GPU nodes are widely used to execute high-performance computing applications. Diverse disciplines use these clusters to improve the performance of several services that consume high computational resources. The challenge of executing high-performance computing applications becomes harder when the applications are executed concurrently and each one of them may demand multiple GPU nodes for different periods of time. To tackle this challenge, we propose a multi-agent architecture for scheduling multiple services in a heterogeneous GPU cluster. We provide simulation results of our agent-based system utilizing three commonly used scheduling heuristics for several configuration settings.
March 22, 2018 by hgpu