LGAIMLMay 21, 2018

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

arXiv:1805.07956v214 citations
Originality Incremental advance
AI Analysis

This work addresses a theoretical difficulty in RL for practitioners, but it is incremental as it builds on prior analysis of multiple-step greedy policies.

The paper tackles the challenge of ensuring monotonic policy improvement in multiple-step greedy reinforcement learning algorithms, particularly with soft-policy updates, and formulates online and approximate algorithms to address this issue.

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work \cite{efroni2018beyond}, multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes