LG MLJun 1, 2020

Robust Reinforcement Learning with Wasserstein Constraint

Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhiming Ma, Dawei Yin

arXiv:2006.00945v115.330 citations

Originality Incremental advance

AI Analysis

This work addresses robustness in reinforcement learning for systems sensitive to environmental dynamics, though it appears incremental as it builds on existing robust RL methods with a new metric.

The paper tackles the problem of ensuring robustness in reinforcement learning by using Wasserstein distance to measure disturbances in transition probabilities, reducing an infinite-dimensional optimization to a finite-dimensional risk-aware problem, and proposes the WRAAC algorithm, which is verified in the Cart-Pole environment.

Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wasserstein distance to measure the disturbance to the reference transition kernel. With Wasserstein distance, we are able to connect transition kernel disturbance to the state disturbance, i.e. reduce an infinite-dimensional optimization problem to a finite-dimensional risk-aware problem. Through the derived risk-aware optimal Bellman equation, we show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm--Wasserstein Robust Advantage Actor-Critic algorithm (WRAAC). The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.

View on arXiv PDF

Similar