Switching-Geometry Analysis of Deflated Q-Value Iteration
For researchers analyzing convergence of reinforcement learning algorithms, this provides a more precise geometric understanding of Q-value iteration, but the result is incremental as it does not improve the algorithm's performance.
The paper develops a joint spectral radius (JSR) framework for analyzing deflated Q-value iteration in discounted MDPs, showing that the projected switching system's JSR can be strictly smaller than the discount factor, yielding a sharper convergence-rate characterization. However, the deflation does not change the greedy-policy sequence compared to standard Q-VI.
This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $γ\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $γ$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $γ$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.