LGAIMLNov 17, 2024

Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

arXiv:2411.11099v1h-index: 1Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses coordination issues in cooperative multi-agent tasks, offering a method to mitigate suboptimal collective decisions, though it appears incremental as it builds on existing Q-learning frameworks.

The paper tackles the problem of relative over-generalization in decentralized multi-agent reinforcement learning, where agents undervalue optimal joint actions, and introduces MaxMax Q-Learning (MMQ) to improve coordination, showing it frequently outperforms baselines with enhanced convergence and sample efficiency.

In decentralized multi-agent reinforcement learning, agents learning in isolation can lead to relative over-generalization (RO), where optimal joint actions are undervalued in favor of suboptimal ones. This hinders effective coordination in cooperative tasks, as agents tend to choose actions that are individually rational but collectively suboptimal. To address this issue, we introduce MaxMax Q-Learning (MMQ), which employs an iterative process of sampling and evaluating potential next states, selecting those with maximal Q-values for learning. This approach refines approximations of ideal state transitions, aligning more closely with the optimal joint policy of collaborating agents. We provide theoretical analysis supporting MMQ's potential and present empirical evaluations across various environments susceptible to RO. Our results demonstrate that MMQ frequently outperforms existing baselines, exhibiting enhanced convergence and sample efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes