LGJun 13, 2024

On Value Iteration Convergence in Connected MDPs

Arsenii Mustafin, Alex Olshevsky, Ioannis Ch. Paschalidis

arXiv:2406.09592v14.61 citations

Originality Synthesis-oriented

AI Analysis

This provides theoretical guarantees for convergence in connected MDPs, which is incremental but important for reinforcement learning practitioners.

The paper proves that in Markov Decision Processes with a unique optimal policy and ergodic transitions, Value Iteration converges geometrically faster than the discount factor γ for both discounted and average-reward settings.

This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor γ for both discounted and average-reward criteria.

View on arXiv PDF

Similar