LGAISYJun 29, 2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

UW
arXiv:2306.16750v25 citationsh-index: 33Has Code
Originality Highly original
AI Analysis

This addresses the challenge of efficient and stable value approximation in reinforcement learning, offering a novel approach with broad applicability, though it is incremental in building on existing TD methods.

The authors tackled the problem of value approximation in deep reinforcement learning by proposing the Eigensubspace Regularized Critic (ERC) method, which leverages the 1-eigensubspace of the transition kernel to guide approximation error, resulting in outperforming state-of-the-art methods on 20 out of 26 DMControl benchmark tasks and reducing variance.

We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at https://sites.google.com/view/erc-ecml23/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes