AISep 19, 2023

Monte-Carlo tree search with uncertainty propagation via optimal transport

Tuan Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D'Eramo, Odalric-Ambrym Maillard

arXiv:2309.10737v15.42 citationsh-index: 27

Originality Incremental advance

AI Analysis

This work addresses a domain-specific challenge in reinforcement learning for stochastic environments, offering incremental improvements to MCTS methods.

The paper tackles the problem of improving Monte-Carlo Tree Search (MCTS) for highly stochastic and partially observable Markov decision processes by introducing a novel backup strategy that models nodes as Gaussian distributions and uses Wasserstein barycenters for uncertainty propagation, resulting in outperformance over baselines in empirical evaluations.

This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of $L^1$-Wasserstein barycenter with $α$-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.

View on arXiv PDF

Similar