LGAINov 7, 2023

Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning

arXiv:2311.03711v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses a key problem of estimation errors for deep reinforcement learning practitioners, offering incremental improvements over existing methods.

The paper tackles estimation bias in deep reinforcement learning by introducing a twin TD-regularized actor-critic method, which reduces errors and, when combined with other improvements, outperforms baselines and achieves new state-of-the-art performance in DeepMind Control Suite environments.

We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method. It aims at reducing both over and under-estimation errors. With TDR and by combining good DRL improvements, such as distributional learning and long N-step surrogate stage reward (LNSS) method, we show that our new TDR-based actor-critic learning has enabled DRL methods to outperform their respective baselines in challenging environments in DeepMind Control Suite. Furthermore, they elevate TD3 and SAC respectively to a level of performance comparable to that of D4PG (the current SOTA), and they also improve the performance of D4PG to a new SOTA level measured by mean reward, convergence speed, learning success rate, and learning variance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes