DCAILGDec 11, 2025

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

arXiv:2512.10271v1h-index: 8SoCC
Originality Incremental advance
AI Analysis

This addresses the challenge for cloud providers in managing diverse DL workloads efficiently without per-job profiling, though it builds incrementally on existing RL and optimization methods.

The paper tackles the problem of scheduling deep learning workloads on heterogeneous GPU clusters by introducing RLTune, a reinforcement learning-based framework that integrates RL-driven prioritization with MILP-based mapping, resulting in up to 20% higher GPU utilization, 81% lower queueing delay, and 70% shorter job completion times.

Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters and limited visibility into application characteristics pose major challenges for existing schedulers, which often rely on offline profiling or application-specific assumptions. We present RLTune, an application-agnostic reinforcement learning (RL)-based scheduling framework that dynamically prioritizes and allocates DL jobs on heterogeneous GPU clusters. RLTune integrates RL-driven prioritization with MILP-based job-to-node mapping to optimize system-wide objectives such as job completion time (JCT), queueing delay, and resource utilization. Trained on large-scale production traces from Microsoft Philly, Helios, and Alibaba, RLTune improves GPU utilization by up to 20%, reduces queueing delay by up to 81%, and shortens JCT by as much as 70 percent. Unlike prior approaches, RLTune generalizes across diverse workloads without requiring per-job profiling, making it practical for cloud providers to deploy at scale for more efficient, fair, and sustainable DL workload management.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes