LGOct 27, 2025

Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs

arXiv:2510.23914v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work provides a theoretical unification for MDP analysis, which is incremental as it extends existing geometric methods from discounted to average reward cases.

The authors tackled the problem of separately analyzing discounted and average reward Markov Decision Processes (MDPs) by extending a geometric interpretation to unify both cases, resulting in a proof that Value Iteration achieves geometric convergence under a unique and ergodic optimal policy for average-reward MDPs.

The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes