LGMay 15, 2022

Policy Gradient Method For Robust Reinforcement Learning

arXiv:2205.07344v1106 citationsh-index: 21
Originality Highly original
AI Analysis

This addresses the problem of learning policies robust to simulator-environment mismatches for reinforcement learning practitioners, representing a foundational advance with theoretical guarantees.

The paper tackles robust reinforcement learning under model mismatch by developing the first policy gradient method with global optimality guarantees and complexity analysis, achieving an $\mathcal O(\varepsilon^{-3})$ complexity for an $\varepsilon$-global optimum and extending to model-free settings with asymptotic convergence.

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment. We first develop the robust policy (sub-)gradient, which is applicable for any differentiable parametric policy class. We show that the proposed robust policy gradient method converges to the global optimum asymptotically under direct policy parameterization. We further develop a smoothed robust policy gradient method and show that to achieve an $ε$-global optimum, the complexity is $\mathcal O(ε^{-3})$. We then extend our methodology to the general model-free setting and design the robust actor-critic method with differentiable parametric policy class and value function. We further characterize its asymptotic convergence and sample complexity under the tabular setting. Finally, we provide simulation results to demonstrate the robustness of our methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes