ADDQ: Adaptive Distributional Double Q-Learning
This work addresses a fundamental problem in reinforcement learning for researchers and practitioners, offering an incremental improvement to existing distributional algorithms with easy implementation.
The paper tackles bias and overestimation in Q-value estimation, which slows down Q-learning and actor-critic convergence, by proposing ADDQ, an adaptive method built on distributional reinforcement learning that reduces overestimation locally. Experiments in tabular, Atari, and MuJoCo environments show improvements, though specific numerical results are not detailed in the abstract.
Bias problems in the estimation of $Q$-values are a well-known obstacle that slows down convergence of $Q$-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We propose an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework is simple to implement, existing distributional algorithms can be improved with a few lines of code. We provide theoretical evidence and use double $Q$-learning to show how to include locally adaptive overestimation control in existing algorithms. Experiments are provided for tabular, Atari, and MuJoCo environments.