LGOCMLJul 21, 2020

On Linear Convergence of Policy Gradient Methods for Finite MDPs

arXiv:2007.11120v270 citations
AI Analysis

This provides a theoretical improvement for reinforcement learning practitioners by establishing faster convergence guarantees in a foundational setting.

The paper tackles the problem of analyzing policy gradient methods in finite MDPs, showing that many variants achieve linear convergence rates with large step-sizes, contrasting prior sub-linear results.

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems and showing sub-linear convergence rates with small step-sizes. Here, we take a different perspective based on connections with policy iteration and show that many variants of policy gradient methods succeed with large step-sizes and attain a linear rate of convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes