LG MLJun 14, 2018

Stochastic Variance-Reduced Policy Gradient

Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli

arXiv:1806.05618v127.6207 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a more efficient algorithm for reinforcement learning practitioners, though it is incremental as it adapts existing supervised learning techniques to a new domain.

The paper tackled the challenge of adapting stochastic variance-reduced gradient methods to policy gradient reinforcement learning, addressing issues like non-concave objectives and non-stationary sampling, resulting in the SVRPG algorithm with convergence guarantees and linear convergence rates under increasing batch sizes.

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

View on arXiv PDF Code

Similar