LGAIMAFeb 15, 2023

Scalable Multi-Agent Reinforcement Learning with General Utilities

Berkeley
arXiv:2302.07938v23 citationsh-index: 38
AI Analysis

This addresses the challenge of efficient learning in multi-agent systems with limited observability, representing a novel advancement rather than an incremental improvement.

The paper tackles the problem of scalable multi-agent reinforcement learning with general utilities, where agents lack full observability, by proposing a distributed policy gradient algorithm that exploits network spatial correlation decay. The result shows convergence to ε-stationarity with Õ(ε⁻²) samples and an approximation error decreasing exponentially with communication radius.

We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of the network structure, we propose a scalable distributed policy gradient algorithm with shadow reward and localized policy that consists of three steps: (1) shadow reward estimation, (2) truncated shadow Q-function estimation, and (3) truncated policy gradient estimation and policy update. Our algorithm converges, with high probability, to $ε$-stationarity with $\widetilde{\mathcal{O}}(ε^{-2})$ samples up to some approximation error that decreases exponentially in the communication radius. This is the first result in the literature on multi-agent RL with general utilities that does not require the full observability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes