LGAIROFeb 14, 2022

Robust Policy Learning over Multiple Uncertainty Sets

arXiv:2202.07013v225 citations
AI Analysis

This addresses the challenge of robustness in safety-critical RL environments, offering a more general solution than single-set robust RL, though it is incremental in combining existing ideas.

The paper tackles the problem of reinforcement learning agents needing robustness to environmental variations by proposing a method that learns a policy robust to multiple uncertainty sets, demonstrating improved worst-case performance on new environments compared to prior approaches.

Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes