LGAIOCOct 4, 2022

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

arXiv:2210.01400v344 citationsh-index: 49
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for policy optimization in reinforcement learning, but it is incremental as it builds on existing methods and frameworks.

The authors tackled the convergence analysis of natural policy gradient methods in infinite-horizon discounted Markov decision processes, showing that these methods achieve linear convergence rates and $ ilde{\mathcal{O}}(1/ε^2)$ sample complexities with a simple step size, without requiring strong convexity.

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/ε^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes