LGAISep 28, 2024

Double Actor-Critic with TD Error-Driven Regularization in Reinforcement Learning

arXiv:2409.19231v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses value estimation for reinforcement learning practitioners, offering a simplified and competitive method, though it appears incremental as it builds on existing double actor-critic frameworks.

The paper tackles the problem of improving value estimation in reinforcement learning by proposing TDDR, a double actor-critic algorithm with temporal difference error-driven regularization, which shows strong competitiveness in continuous control tasks without introducing extra hyperparameters.

To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with each actor paired with a critic, thereby fully leveraging the advantages of double critics. Additionally, TDDR introduces an innovative critic regularization architecture. Compared to classical deterministic policy gradient-based algorithms that lack a double actor-critic structure, TDDR provides superior estimation. Moreover, unlike existing algorithms with double actor-critic frameworks, TDDR does not introduce any additional hyperparameters, significantly simplifying the design and implementation process. Experiments demonstrate that TDDR exhibits strong competitiveness compared to benchmark algorithms in challenging continuous control tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes