GTLGMATHDSJun 24, 2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

arXiv:2106.12928v140 citations
Originality Incremental advance
AI Analysis

This addresses the open problem of equilibrium selection in competitive multi-agent settings, offering theoretical convergence guarantees without parameter fine-tuning, though it is incremental as it builds on prior work in weighted potential games.

The paper tackles the problem of exploration-exploitation balance in competitive multi-agent learning by studying smooth Q-learning, showing it always converges to the unique quantal-response equilibrium in weighted zero-sum polymatrix games with heterogeneous agents and positive exploration rates, with experiments in network zero-sum games providing algorithmic guarantees for equilibrium selection.

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes