LGMAMLMar 8, 2021

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

arXiv:2103.04972v129 citations
Originality Incremental advance
AI Analysis

This work addresses communication efficiency in multi-agent systems for applications like advertising and federated learning, presenting a generalization from bandit literature to MDPs, which is incremental but extends existing ideas.

The paper tackles cooperative multi-agent reinforcement learning with function approximation, showing that near-optimal no-regret learning is achievable with a fixed constant communication budget and Pareto-optimal no-regret learning in heterogeneous settings with limited communication.

Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this paper, we discuss the problem of cooperative multi-agent RL with function approximation, where a group of agents communicates with each other to jointly solve an episodic MDP. We demonstrate that via careful message-passing and cooperative value iteration, it is possible to achieve near-optimal no-regret learning even with a fixed constant communication budget. Next, we demonstrate that even in heterogeneous cooperative settings, it is possible to achieve Pareto-optimal no-regret learning with limited communication. Our work generalizes several ideas from the multi-agent contextual and multi-armed bandit literature to MDPs and reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes