OCLGMLMay 3, 2025

Rank-One Modified Value Iteration

arXiv:2505.01828v32 citationsh-index: 27ICML
Originality Incremental advance
AI Analysis

This provides an improved algorithm for researchers and practitioners in reinforcement learning, though it appears incremental as it builds on existing policy iteration methods.

The paper tackles planning and learning problems in Markov decision processes by proposing a novel algorithm that uses rank-one approximation of transition probability matrices in policy evaluation, achieving the same theoretical convergence rate and complexity as value iteration and Q-learning while consistently outperforming first-order and accelerated algorithms in simulations.

In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes