AIMar 21, 2021

Monte Carlo Information-Oriented Planning

Vincent Thomas, Gérémy Hutin, Olivier Buffet

arXiv:2103.11345v16.12 citations

Originality Incremental advance

AI Analysis

This work addresses planning under uncertainty for AI systems, but it is incremental as it builds on existing POMCP methods.

The paper tackles the problem of solving information-gathering tasks modeled as rho-POMDPs, where rewards depend on belief states, by proposing a Monte Carlo Tree Search algorithm that outperforms myopic approaches in experiments.

In this article, we discuss how to solve information-gathering problems expressed as rho-POMDPs, an extension of Partially Observable Markov Decision Processes (POMDPs) whose reward rho depends on the belief state. Point-based approaches used for solving POMDPs have been extended to solving rho-POMDPs as belief MDPs when its reward rho is convex in B or when it is Lipschitz-continuous. In the present paper, we build on the POMCP algorithm to propose a Monte Carlo Tree Search for rho-POMDPs, aiming for an efficient on-line planner which can be used for any rho function. Adaptations are required due to the belief-dependent rewards to (i) propagate more than one state at a time, and (ii) prevent biases in value estimates. An asymptotic convergence proof to epsilon-optimal values is given when rho is continuous. Experiments are conducted to analyze the algorithms at hand and show that they outperform myopic approaches.

View on arXiv PDF

Similar