Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems
National Changhua University of Education
The Journal of Supercomputing, 2024
@article{ken2024efficient,
title={Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems},
author={Ken, Pao-Yi and Wu, Chao-Chin},
year={2024}
}
Deep Learning Automatic Hyperparameter Tuning plays a crucial role in advancing Artificial Intelligence applications, eliminating the need for complex expertise and costly manual operations. Ray Tune, developed by the University of California, Berkeley, has gained widespread adoption among notable companies like Amazon and Uber. In contrast to large enterprises, the hardware commonly used by the general public is often a mix of old and new machines with varying specifications, creating a highly heterogeneous computing environment. This study focuses on optimizing the utilization of heterogeneous resources during machine learning training to improve overall system performance. Through experiments and analysis utilizing Ray Tune, two optimization strategies are explored for different configurations: checkpoint location optimization and a scheduling strategy for consolidating heterogeneous resources. Experimental results show that, with reasonable configurations, storing checkpoints in both main memory and external storage can effectively reduce overall training time. Adopting the heterogeneous resource consolidation scheduling strategy and dynamically allocating tasks based on the computing capabilities of each configured node results in a significant 2.36-fold improvement in overall training time. These optimization strategies offer valuable insights into effectively leveraging heterogeneous resources in automatic hyperparameter tuning.
October 20, 2024 by hgpu