MLLGMar 9, 2020

Stochastic Recursive Momentum for Policy Gradient Methods

arXiv:2003.04302v135 citations
AI Analysis

This work addresses efficiency and usability issues in reinforcement learning for practitioners, though it is incremental as it builds on existing variance-reduced methods.

The authors tackled the problem of high sample complexity and complex parameter tuning in policy gradient methods by proposing STORM-PG, which achieves a provably sharp O(1/ε^3) sample complexity bound and simplifies tuning by avoiding batch alternations.

In this paper, we propose a novel algorithm named STOchastic Recursive Momentum for Policy Gradient (STORM-PG), which operates a SARAH-type stochastic recursive variance-reduced policy gradient in an exponential moving average fashion. STORM-PG enjoys a provably sharp $O(1/ε^3)$ sample complexity bound for STORM-PG, matching the best-known convergence rate for policy gradient algorithm. In the mean time, STORM-PG avoids the alternations between large batches and small batches which persists in comparable variance-reduced policy gradient methods, allowing considerably simpler parameter tuning. Numerical experiments depicts the superiority of our algorithm over comparative policy gradient algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes