AI LGJun 14, 2018

Learning in POMDPs with Monte Carlo Tree Search

Sammie Katt, Frans A. Oliehoek, Christopher Amato

arXiv:1806.05631v120.069 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of learning in partially observable environments for AI and robotics, representing an incremental advance in Bayesian reinforcement learning methods.

The paper tackles the problem of solving Bayes-Adaptive POMDPs, which are impractical for non-trivial domains, by extending Monte Carlo Tree Search (POMCP) to create BA-POMCP, enabling solution of previously unsolvable problems with proof of convergence.

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

View on arXiv PDF

Similar