On Value Iteration Convergence in Connected MDPs
This provides theoretical guarantees for convergence in connected MDPs, which is incremental but important for reinforcement learning practitioners.
The paper proves that in Markov Decision Processes with a unique optimal policy and ergodic transitions, Value Iteration converges geometrically faster than the discount factor γ for both discounted and average-reward settings.
This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor γ for both discounted and average-reward criteria.