LGAIApr 10

On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach

arXiv:2604.092022.5h-index: 4
Predicted impact top 93% in LG · last 90 daysOriginality Synthesis-oriented
AI Analysis

This addresses reliability issues for cloud providers in energy-aware scheduling, but it is incremental as it exposes limitations rather than proposing a new solution.

The paper tackled the problem of scheduling workflow DAGs in cloud computing to minimize completion time and energy usage, and found that GNN-based deep reinforcement learning schedulers fail under specific out-of-distribution conditions due to structural mismatches.

Cloud providers must assign heterogeneous compute resources to workflow DAGs while balancing competing objectives such as completion time, cost, and energy consumption. In this work, we study a single-workflow, queue-free scheduling setting and consider a graph neural network (GNN)-based deep reinforcement learning scheduler designed to minimize workflow completion time and energy usage. We identify specific out-of-distribution (OOD) conditions under which GNN-based deep reinforcement learning schedulers fail and provide a principled explanation of why these failures occur. Through controlled OOD evaluations, we demonstrate that performance degradation stems from structural mismatches between training and deployment environments, which disrupt message passing and undermine policy generalization. Our analysis exposes fundamental limitations of current GNN-based schedulers and highlights the need for more robust representations to ensure reliable scheduling performance under distribution shifts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes