LGMLMay 25, 2019

Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive Shielding

arXiv:1905.10691v316 citations
Originality Incremental advance
AI Analysis

This addresses safety concerns for robotics applications like walking robots or autonomous cars, but it is incremental as it builds on known dynamics and simulation-based training.

The paper tackles the problem of ensuring safety for reinforcement learning policies in robotics by proposing model predictive shielding (MPS), which switches between a learned policy and a backup policy to maintain safety, and empirically validates it on the cart-pole task.

Reinforcement learning is a promising approach to synthesizing policies for challenging robotics tasks. A key problem is how to ensure safety of the learned policy---e.g., that a walking robot does not fall over or that an autonomous car does not run into an obstacle. We focus on the setting where the dynamics are known, and the goal is to ensure that a policy trained in simulation satisfies a given safety constraint. We propose an approach, called model predictive shielding (MPS), that switches on-the-fly between a learned policy and a backup policy to ensure safety. We prove that our approach guarantees safety, and empirically evaluate it on the cart-pole.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes