LG AIMay 8, 2022

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

Qing Li, Wengang Zhou, Zhenbo Lu, Houqiang Li

arXiv:2205.03819v11.85 citationsh-index: 68Has Code

Originality Incremental advance

AI Analysis

This addresses sample efficiency and bias issues for reinforcement learning practitioners, but it is incremental as it builds on existing actor-critic and Double Q-learning methods.

The paper tackles low sample efficiency and overestimation bias in actor-critic reinforcement learning by proposing SDQ-CAL, which modifies the Bellman operator with Advantage Learning and uses simultaneous double Q-learning. It achieves state-of-the-art performance in continuous control benchmarks, with experiments showing less biased value estimation.

Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks. However, they still suffer two nontrivial obstacles, i.e., low sample efficiency and overestimation bias. To this end, we propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL). Our SDQ-CAL boosts the Double Q-learning for off-policy actor-critic RL based on a modification of the Bellman optimality operator with Advantage Learning. Specifically, SDQ-CAL improves sample efficiency by modifying the reward to facilitate the distinction from experience between the optimal actions and the others. Besides, it mitigates the overestimation issue by updating a pair of critics simultaneously upon double estimators. Extensive experiments reveal that our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks. We release the source code of our method at: \url{https://github.com/LQNew/SDQ-CAL}.

View on arXiv PDF Code

Similar