AICCJan 30

Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

arXiv:2601.23229v14 citationsh-index: 5
Originality Highly original
AI Analysis

This resolves an important open problem in optimization for sequential decision-making under uncertainty, providing a foundational algorithmic advance.

The paper tackles the problem of finding strongly-polynomial time algorithms for robust Markov decision processes (RMDPs) with L∞ uncertainty sets, and shows that a robust policy iteration algorithm achieves this for a constant discount factor.

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes