AIMar 15, 2012

Rollout Sampling Policy Iteration for Decentralized POMDPs

Feng Wu, Shlomo Zilberstein, Xiaoping Chen

arXiv:1203.3528v128 citations

Originality Incremental advance

AI Analysis

This addresses scalability issues for multi-agent systems in domains like robotics or autonomous vehicles, though it is incremental as it builds on rollout and sampling techniques.

The paper tackles the scalability of multi-agent decision problems in decentralized partially observable Markov decision processes (DEC-POMDPs) by introducing DecRSPI, an algorithm that uses Monte Carlo sampling and a compact policy representation, achieving linear time complexity and solving larger problems than existing methods.

We present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.

View on arXiv PDF

Similar