Analysis of Value Iteration Through Absolute Probability Sequences
This work offers a new analytical perspective for researchers in reinforcement learning, but it appears incremental as it builds on existing convergence studies.
The authors tackled the problem of analyzing Value Iteration for Markov Decision Processes by developing a new approach using absolute probability sequences to examine convergence in the L² norm, providing a fresh perspective on its behavior.
Value Iteration is a widely used algorithm for solving Markov Decision Processes (MDPs). While previous studies have extensively analyzed its convergence properties, they primarily focus on convergence with respect to the infinity norm. In this work, we use absolute probability sequences to develop a new line of analysis and examine the algorithm's convergence in terms of the $L^2$ norm, offering a new perspective on its behavior and performance.