AIJan 23, 2013

On the Complexity of Policy Iteration

arXiv:1301.6718v1103 citations
Originality Highly original
AI Analysis

This work addresses a foundational theoretical problem in reinforcement learning and decision-making, providing new insights into PI's progression through policy space.

The paper tackles the problem of bounding the complexity of policy iteration (PI) in Markov decision processes without dependence on the discount factor, proving the first non-trivial worst-case upper bounds on the number of iterations required for convergence to the optimal policy.

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes