LGMar 15, 2023

Smoothed Q-learning

arXiv:2303.08631v15 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a known bottleneck in reinforcement learning for improving algorithm stability and efficiency, though it appears incremental as it builds on existing Q-learning variants.

The paper tackles the overestimation problem in Q-learning by introducing an alternative algorithm that replaces the max operation with an average, resulting in a provably convergent off-policy method that mitigates overestimation while retaining similar convergence speed as standard Q-learning.

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double Q-learning is a provably convergent alternative that mitigates some of the overestimation issues, though sometimes at the expense of slower convergence. We introduce an alternative algorithm that replaces the max operation with an average, resulting also in a provably convergent off-policy algorithm which can mitigate overestimation yet retain similar convergence as standard Q-learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes