AIMar 15, 2012

Anytime Planning for Decentralized POMDPs using Expectation Maximization

arXiv:1203.3490v142 citations
Originality Incremental advance
AI Analysis

This addresses the complexity of multi-agent sequential decision-making in infinite-horizon settings, offering a novel method that is incremental in improving existing approaches.

The paper tackles the problem of infinite-horizon decentralized POMDPs by recasting policy optimization as inference in dynamic Bayesian networks, using Expectation Maximization to achieve competitive results against state-of-the-art solvers in benchmark domains.

Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While fnite-horizon DECPOMDPs have enjoyed signifcant success, progress remains slow for the infnite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infnite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes