LGMAOCOct 25, 2021

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

arXiv:2110.12603v117 citations
Originality Incremental advance
AI Analysis

This work addresses the complexity of multi-agent reinforcement learning under information asymmetry, offering a general framework that can guide practical deep-MARL network designs, though it appears incremental as it generalizes existing methods.

The paper tackles the challenge of finding optimal policies in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) by developing a compression framework with approximate common and private state representations, deriving an optimality gap for dynamic programming based on approximation errors and time steps.

Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes