LGAIMay 19, 2024

Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning

arXiv:2405.11669v12 citationsh-index: 23L4DC
Originality Incremental advance
AI Analysis

This addresses safety in RL for control systems, such as autonomous vehicles, by providing a more feasible and philosophically grounded constraint formulation, though it appears incremental as it builds on existing constrained optimization approaches.

The paper tackles the problem of safe reinforcement learning by introducing a counterfactual harm constraint that penalizes agents only for violations they cause, compared to a default safe policy. In simulations with a rover and tractor-trailer, this approach enabled learning safer policies than existing constrained RL methods.

Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. When considering safety constraints, constrained optimization approaches, where agents are penalized for constraint violations, are commonly used. In such methods, if agents are initialized in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy. In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem. We present simulation studies on a rover with uncertain road friction and a tractor-trailer parking environment that demonstrate our constraint formulation enables agents to learn safer policies than contemporary constrained RL methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes