DC AI LGSep 22, 2025

Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization

Leszek Sliwko, Jolanta Mizera-Pietraszko

arXiv:2509.22701v1h-index: 5IPDPSW

Originality Incremental advance

AI Analysis

This work addresses real-time adaptability issues in cluster scheduling for HPC environments, offering a scalable solution that is incremental in improving machine learning integration for constrained tasks.

This study tackled the problem of optimizing task scheduling in cluster systems with node-affinity constraints by developing a continuous transfer learning model that dynamically adapts during operations, achieving over 99% accuracy on Google Cluster Data and reducing computational overhead and scheduling latency.

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.

View on arXiv PDF

Similar