high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

Pao-Yi Ken, Chao-Chin Wu

National Changhua University of Education

The Journal of Supercomputing, 2024

DOI:10.21203/rs.3.rs-5259219/v1

@article{ken2024efficient,

title={Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems},

author={Ken, Pao-Yi and Wu, Chao-Chin},

year={2024}

}

Download (PDF)

View

Source

972

views

Deep Learning Automatic Hyperparameter Tuning plays a crucial role in advancing Artificial Intelligence applications, eliminating the need for complex expertise and costly manual operations. Ray Tune, developed by the University of California, Berkeley, has gained widespread adoption among notable companies like Amazon and Uber. In contrast to large enterprises, the hardware commonly used by the general public is often a mix of old and new machines with varying specifications, creating a highly heterogeneous computing environment. This study focuses on optimizing the utilization of heterogeneous resources during machine learning training to improve overall system performance. Through experiments and analysis utilizing Ray Tune, two optimization strategies are explored for different configurations: checkpoint location optimization and a scheduling strategy for consolidating heterogeneous resources. Experimental results show that, with reasonable configurations, storing checkpoints in both main memory and external storage can effectively reduce overall training time. Adopting the heterogeneous resource consolidation scheduling strategy and dynamically allocating tasks based on the computing capabilities of each configured node results in a significant 2.36-fold improvement in overall training time. These optimization strategies offer valuable insights into effectively leveraging heterogeneous resources in automatic hyperparameter tuning.

Tags: AI, Artificial intelligence, Computer science, Deep learning, Machine learning, nVidia, nVidia RTX A6000, Task scheduling

October 20, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)