AILGJun 14, 2018

Learning in POMDPs with Monte Carlo Tree Search

arXiv:1806.05631v169 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of learning in partially observable environments for AI and robotics, representing an incremental advance in Bayesian reinforcement learning methods.

The paper tackles the problem of solving Bayes-Adaptive POMDPs, which are impractical for non-trivial domains, by extending Monte Carlo Tree Search (POMCP) to create BA-POMCP, enabling solution of previously unsolvable problems with proof of convergence.

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes