Model-Augmented Q-learning
This work provides a method to improve the stability and performance of Q-learning for reinforcement learning practitioners by mitigating estimation biases.
The paper addresses the under- and overestimation bias in Q-learning by proposing Model-augmented Q-learning (MQL), a model-free reinforcement learning framework augmented with model-based components. MQL estimates Q-values, transitions, and rewards with a shared network, using the estimated reward to improve Q-learning and achieve a policy-invariant solution identical to learning with true reward. It significantly improves performance and convergence of state-of-the-art off-policy MFRL methods.
In recent years, $Q$-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for $Q$-learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by utilizing model-estimation errors. We experimentally validate MQL built upon state-of-the-art off-policy MFRL methods, and show that MQL largely improves their performance and convergence. The proposed scheme is simple to implement and does not require additional training cost.