LGJun 13, 2024

On Value Iteration Convergence in Connected MDPs

arXiv:2406.09592v11 citations
Originality Synthesis-oriented
AI Analysis

This provides theoretical guarantees for convergence in connected MDPs, which is incremental but important for reinforcement learning practitioners.

The paper proves that in Markov Decision Processes with a unique optimal policy and ergodic transitions, Value Iteration converges geometrically faster than the discount factor γ for both discounted and average-reward settings.

This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor γ for both discounted and average-reward criteria.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes