LG AISep 28, 2024

Double Actor-Critic with TD Error-Driven Regularization in Reinforcement Learning

Haohui Chen, Zhiyong Chen, Aoxiang Liu, Wentuo Fang

arXiv:2409.19231v16.45 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses value estimation for reinforcement learning practitioners, offering a simplified and competitive method, though it appears incremental as it builds on existing double actor-critic frameworks.

The paper tackles the problem of improving value estimation in reinforcement learning by proposing TDDR, a double actor-critic algorithm with temporal difference error-driven regularization, which shows strong competitiveness in continuous control tasks without introducing extra hyperparameters.

To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with each actor paired with a critic, thereby fully leveraging the advantages of double critics. Additionally, TDDR introduces an innovative critic regularization architecture. Compared to classical deterministic policy gradient-based algorithms that lack a double actor-critic structure, TDDR provides superior estimation. Moreover, unlike existing algorithms with double actor-critic frameworks, TDDR does not introduce any additional hyperparameters, significantly simplifying the design and implementation process. Experiments demonstrate that TDDR exhibits strong competitiveness compared to benchmark algorithms in challenging continuous control tasks.

View on arXiv PDF

Similar