Geometric Re-Analysis of Classical MDP Solving Algorithms
This work provides incremental theoretical insights for researchers in reinforcement learning and optimization, focusing on convergence analysis of established algorithms.
The paper tackles the problem of analyzing classical MDP-solving algorithms like Value Iteration and Policy Iteration using a geometric interpretation, resulting in improved convergence guarantees, such as showing that the asymptotic convergence rate of value iteration can be strictly smaller than the discount factor γ under certain conditions.
We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $γ$, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than $γ$.