AIJul 17, 2018

Preference-Based Monte Carlo Tree Search

Tobias Joppen, Christian Wirth, Johannes Fürnkranz

arXiv:1807.06286v14 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of applying MCTS in domains where numeric feedback is hard to specify, potentially expanding its use to areas with ordinal structures, though it appears incremental as it adapts an existing method.

The paper tackles the problem of Monte Carlo tree search (MCTS) relying on numeric feedback, which can be difficult to define and prone to bias, by proposing a variant that uses only qualitative feedback, enabling new applications. In a puzzle domain, it achieves performance comparable to a regular MCTS baseline with quantitative feedback.

Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define. Real-time MCTS is a variant which may only rarely encounter states with an explicit, extrinsic reward. To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent. Recent work has shown evidence that in different areas the underlying structure is ordinal and not numerical. Hence erroneous and biased heuristics are inevitable, especially in such domains. In this paper, we propose a MCTS variant which only depends on qualitative feedback, and therefore opens up new applications for MCTS. We also find indications that translating absolute into ordinal feedback may be beneficial. Using a puzzle domain, we show that our preference-based MCTS variant, wich only receives qualitative feedback, is able to reach a performance level comparable to a regular MCTS baseline, which obtains quantitative feedback.

View on arXiv PDF

Similar