Robust Policy Learning over Multiple Uncertainty Sets
This addresses the challenge of robustness in safety-critical RL environments, offering a more general solution than single-set robust RL, though it is incremental in combining existing ideas.
The paper tackles the problem of reinforcement learning agents needing robustness to environmental variations by proposing a method that learns a policy robust to multiple uncertainty sets, demonstrating improved worst-case performance on new environments compared to prior approaches.
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.