OCLGJul 21, 2020

On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

arXiv:2007.10916v117 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a theoretical gap in reinforcement learning convergence analysis, which is incremental as it builds on prior partial results.

The paper tackles the open problem of proving convergence for the Monte Carlo Exploring Starts (MCES) reinforcement learning algorithm in the undiscounted stochastic shortest path setting, providing results that complement existing partial findings and help settle this question.

A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby helps further settle the open problem. As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in stochastic approximation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes