Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
This work provides a theoretical unification for MDP analysis, which is incremental as it extends existing geometric methods from discounted to average reward cases.
The authors tackled the problem of separately analyzing discounted and average reward Markov Decision Processes (MDPs) by extending a geometric interpretation to unify both cases, resulting in a proof that Value Iteration achieves geometric convergence under a unique and ergodic optimal policy for average-reward MDPs.
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.