SYLGROOct 24, 2019

Robust Model Predictive Shielding for Safe Reinforcement Learning with Stochastic Dynamics

arXiv:1910.10885v2104 citations
Originality Incremental advance
AI Analysis

This addresses safety concerns for reinforcement learning in stochastic environments, though it is incremental by building on existing shielding methods.

The paper tackles the problem of ensuring safety in reinforcement learning for stochastic nonlinear dynamical systems by proposing a model predictive shielding framework with a tube-based robust NMPC backup controller, empirically demonstrating safety in systems like cart-pole and non-holonomic particles with random obstacles.

This paper proposes a framework for safe reinforcement learning that can handle stochastic nonlinear dynamical systems. We focus on the setting where the nominal dynamics are known, and are subject to additive stochastic disturbances with known distribution. Our goal is to ensure the safety of a control policy trained using reinforcement learning, e.g., in a simulated environment. We build on the idea of model predictive shielding (MPS), where a backup controller is used to override the learned policy as needed to ensure safety. The key challenge is how to compute a backup policy in the context of stochastic dynamics. We propose to use a tube-based robust NMPC controller as the backup controller. We estimate the tubes using sampled trajectories, leveraging ideas from statistical learning theory to obtain high-probability guarantees. We empirically demonstrate that our approach can ensure safety in stochastic systems, including cart-pole and a non-holonomic particle with random obstacles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes