MAAILGMay 13, 2021

SIDE: State Inference for Partially Observable Cooperative Multi-Agent Reinforcement Learning

arXiv:2105.06228v211 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in multi-agent systems where full observability is infeasible, offering a domain-specific improvement for scenarios like gaming or robotics.

The paper tackles the problem of partially observable cooperative multi-agent reinforcement learning by proposing SIDE, a value decomposition framework that eliminates the need for global state information during training, achieving superior results to baselines in complex StarCraft II scenarios.

As one of the solutions to the decentralized partially observable Markov decision process (Dec-POMDP) problems, the value decomposition method has achieved significant results recently. However, most value decomposition methods require the fully observable state of the environment during training, but this is not feasible in some scenarios where only incomplete and noisy observations can be obtained. Therefore, we propose a novel value decomposition framework, named State Inference for value DEcomposition (SIDE), which eliminates the need to know the global state by simultaneously seeking solutions to the two problems of optimal control and state inference. SIDE can be extended to any value decomposition method to tackle partially observable problems. By comparing with the performance of different algorithms in StarCraft II micromanagement tasks, we verified that though without accessible states, SIDE can infer the current state that contributes to the reinforcement learning process based on past local observations and even achieve superior results to many baselines in some complex scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes