LGOCJan 2, 2024

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

arXiv:2401.01084v24 citationsh-index: 4J Sci Comput
AI Analysis

This work addresses convergence issues in policy gradient methods for reinforcement learning, offering an incremental improvement with a new variant that achieves state-of-the-art sample complexity.

The paper tackles the problem of improving the convergence of natural policy gradient methods in reinforcement learning by proposing NPG-HM, a variant that uses Hessian-aided momentum for variance reduction, achieving global last iterate ε-optimality with a sample complexity of O(ε^{-2}), which is the best known result for such methods under generic Fisher non-degenerate policy parameterizations, and demonstrates superior performance in numerical experiments on Mujoco-based environments.

Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate $ε$-optimality with a sample complexity of $\mathcal{O}(ε^{-2})$, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes