LGAIFeb 3, 2024

MinMaxMin $Q$-learning

arXiv:2402.05951v3h-index: 11
Originality Incremental advance
AI Analysis

This addresses a specific problem in reinforcement learning for continuous control tasks, offering incremental improvements over existing algorithms.

The paper tackles overestimation bias in conservative reinforcement learning algorithms by introducing MinMaxMin Q-learning, an optimistic Actor-Critic method that uses disagreement among Q-networks to adjust Q-targets and sampling rules, resulting in consistent performance improvements over DDPG, TD3, and TD7 across MuJoCo and Bullet environments.

MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disagreement among $Q$-networks in the form of the min-batch MaxMin $Q$-networks distance which is added to the $Q$-target and used as the priority experience replay sampling-rule. We implement MinMaxMin on top of TD3 and TD7, subjecting it to rigorous testing against state-of-the-art continuous-space algorithms-DDPG, TD3, and TD7-across popular MuJoCo and Bullet environments. The results show a consistent performance improvement of MinMaxMin over DDPG, TD3, and TD7 across all tested tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes