AILGMLFeb 10, 2018

Beyond the One Step Greedy Approach in Reinforcement Learning

arXiv:1802.03654v354 citations
Originality Incremental advance
AI Analysis

This work provides foundational insights for researchers in reinforcement learning, though it is incremental as it builds on existing policy iteration concepts.

The authors tackled the lack of theoretical analysis for multiple-step lookahead policy improvement in reinforcement learning by formulating variants, deriving new algorithms, and proving their convergence, while showing that recent successful algorithms fit within this framework.

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes