LGNov 15, 2022

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Yanli Liu, Kaiqing Zhang, Tamer Başar, Wotao Yin

arXiv:2211.07937v231.1129 citationsh-index: 98

Originality Incremental advance

AI Analysis

This work addresses convergence issues in reinforcement learning algorithms, offering incremental improvements for researchers and practitioners in the field.

The paper improves convergence guarantees for policy gradient and natural policy gradient methods, showing that a variance-reduced PG method converges globally to optimal value with approximation error and that NPG has lower sample complexity, while proposing SRVR-NPG with global convergence and efficient finite-sample complexity.

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible, with both global convergence and an efficient finite-sample complexity.

View on arXiv PDF

Similar