The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming
This is an incremental theoretical result clarifying computational complexity for researchers in dynamic programming and optimization.
The paper demonstrates that the value iteration algorithm for discounted dynamic programming can require an exponential number of iterations in the number of actions, showing it is not strongly polynomial, unlike policy iteration.
This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming.