LGAIMASYOCDec 1, 2024

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

arXiv:2412.00661v49 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses scalability issues in cooperative MARL for large-scale systems, offering a practical solution with provable guarantees, though it is incremental as it builds on mean-field methods.

The paper tackles the exponential complexity of multi-agent reinforcement learning by proposing SUBSAMPLE-MFQ, a mean-field Q-learning algorithm that learns a policy in polynomial time relative to a subsample size k, achieving convergence to optimality at a rate of ~O(1/√k) independent of the total number of agents n.

Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes