Action Robust Reinforcement Learning and Applications in Continuous Control
This work addresses robustness to action uncertainty in RL, which is crucial for applications like robotics where perturbations are common, but it is incremental as it builds on existing RL methods with new criteria.
The paper tackles the problem of reinforcement learning policies being vulnerable to action perturbations, such as adversarial actions or noise, by formalizing two new robustness criteria and developing algorithms for both tabular and deep RL settings. The result is that their approach not only produces robust policies in MuJoCo domains but also improves performance even without perturbations, suggesting action-robustness acts as implicit regularization.
A policy is said to be robust if it maximizes the reward while considering a bad, or even adversarial, model. In this work we formalize two new criteria of robustness to action uncertainty. Specifically, we consider two scenarios in which the agent attempts to perform an action $a$, and (i) with probability $α$, an alternative adversarial action $\bar a$ is taken, or (ii) an adversary adds a perturbation to the selected action in the case of continuous action space. We show that our criteria are related to common forms of uncertainty in robotics domains, such as the occurrence of abrupt forces, and suggest algorithms in the tabular case. Building on the suggested algorithms, we generalize our approach to deep reinforcement learning (DRL) and provide extensive experiments in the various MuJoCo domains. Our experiments show that not only does our approach produce robust policies, but it also improves the performance in the absence of perturbations. This generalization indicates that action-robustness can be thought of as implicit regularization in RL problems.