AILOMay 19, 2020

Safe Learning for Near Optimal Scheduling

arXiv:2005.09253v24 citations
AI Analysis

This work addresses the challenge of scalable and safe scheduling for complex systems, which is incremental as it builds on existing methods like Monte-Carlo tree search and safety games.

The paper tackles the problem of safe and near-optimal scheduling for preemptible tasks in large Markov decision processes (MDPs) with over 10^20 states, achieving this by combining synthesis, model-based learning, and online sampling techniques with PAC guarantees and empirical comparisons against shielded deep Q-learning.

In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes