LGMLJun 14, 2018

Stochastic Variance-Reduced Policy Gradient

arXiv:1806.05618v1207 citations
Originality Incremental advance
AI Analysis

This work provides a more efficient algorithm for reinforcement learning practitioners, though it is incremental as it adapts existing supervised learning techniques to a new domain.

The paper tackled the challenge of adapting stochastic variance-reduced gradient methods to policy gradient reinforcement learning, addressing issues like non-concave objectives and non-stationary sampling, resulting in the SVRPG algorithm with convergence guarantees and linear convergence rates under increasing batch sizes.

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes