On the convergence of optimistic policy iteration for stochastic shortest path problem
This work addresses convergence issues in reinforcement learning algorithms for stochastic shortest path problems, but it appears incremental as it focuses on a special case of an existing method.
The paper tackles the convergence of optimistic policy iteration for stochastic shortest path problems, proving results under conditions where the termination state is reached almost surely, using Monte Carlo and TD(λ) methods for policy evaluation.
In this paper, we prove some convergence results of a special case of optimistic policy iteration algorithm for stochastic shortest path problem. We consider both Monte Carlo and $TD(λ)$ methods for the policy evaluation step under the condition that the termination state will eventually be reached almost surely.