LGOCJul 15, 2024

Deflated Dynamics Value Iteration

arXiv:2407.10454v24 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses a bottleneck in reinforcement learning algorithms for applications requiring fast value function estimation, though it is an incremental improvement over existing methods.

The paper tackles the slow convergence of Value Iteration (VI) in reinforcement learning when the discount factor is near 1 by proposing Deflated Dynamics Value Iteration (DDVI), which accelerates computation to achieve a convergence rate of Σ(γ^k |λ_{s+1}|^k).

The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration $k$ is $O(γ^k)$, it is slow when the discount factor $γ$ is close to $1$. To accelerate the computation of the value function, we propose Deflated Dynamics Value Iteration (DDVI). DDVI uses matrix splitting and matrix deflation techniques to effectively remove (deflate) the top $s$ dominant eigen-structure of the transition matrix $\mathcal{P}^π$. We prove that this leads to a $\tilde{O}(γ^k |λ_{s+1}|^k)$ convergence rate, where $λ_{s+1}$is $(s+1)$-th largest eigenvalue of the dynamics matrix. We then extend DDVI to the RL setting and present Deflated Dynamics Temporal Difference (DDTD) algorithm. We empirically show the effectiveness of the proposed algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes