LGSYJun 14, 2024

Finite-Time Analysis of Simultaneous Double Q-learning

arXiv:2406.09946v2
Originality Incremental advance
AI Analysis

This work addresses a fundamental issue in reinforcement learning for practitioners, but it is incremental as it modifies an existing method.

The paper tackles the overestimation bias in Q-learning by proposing simultaneous double Q-learning (SDQ), which eliminates random selection between estimators and enables a novel switching system framework for finite-time analysis, showing faster convergence than double Q-learning while mitigating bias.

$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double $Q$-learning employs two independent $Q$-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double $Q$-learning, called simultaneous double $Q$-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two $Q$-estimators, and this modification allows us to analyze double $Q$-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double $Q$-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes