DCAILGSep 22, 2025

Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization

arXiv:2509.22701v1h-index: 5IPDPSW
Originality Incremental advance
AI Analysis

This work addresses real-time adaptability issues in cluster scheduling for HPC environments, offering a scalable solution that is incremental in improving machine learning integration for constrained tasks.

This study tackled the problem of optimizing task scheduling in cluster systems with node-affinity constraints by developing a continuous transfer learning model that dynamically adapts during operations, achieving over 99% accuracy on Google Cluster Data and reducing computational overhead and scheduling latency.

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes