AIMay 19, 2023

Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Yang You, Vincent Thomas, Francis Colas, Olivier Buffet

arXiv:2305.11811v13.92 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of designing controllers for collaborative agents under uncertainty, offering a simulation-based approach that is more accessible than seeking global optima, though it is incremental as it adapts existing equilibrium-seeking methods to cases with only a simulator.

The paper tackles the problem of finding Nash equilibria in decentralized partially observable Markov decision processes (Dec-POMDPs) using only a generative model, and shows that their Monte-Carlo-based method (MC-JESP) is competitive with existing solvers, often outperforming offline methods that rely on explicit models.

Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium -- each agent policy being a best response to the other agents -- is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent's FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers, even better than many offline methods using explicit models.

View on arXiv PDF

Similar