Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning
This addresses a key problem of estimation errors for deep reinforcement learning practitioners, offering incremental improvements over existing methods.
The paper tackles estimation bias in deep reinforcement learning by introducing a twin TD-regularized actor-critic method, which reduces errors and, when combined with other improvements, outperforms baselines and achieves new state-of-the-art performance in DeepMind Control Suite environments.
We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method. It aims at reducing both over and under-estimation errors. With TDR and by combining good DRL improvements, such as distributional learning and long N-step surrogate stage reward (LNSS) method, we show that our new TDR-based actor-critic learning has enabled DRL methods to outperform their respective baselines in challenging environments in DeepMind Control Suite. Furthermore, they elevate TD3 and SAC respectively to a level of performance comparable to that of D4PG (the current SOTA), and they also improve the performance of D4PG to a new SOTA level measured by mean reward, convergence speed, learning success rate, and learning variance.