MLAILGFeb 1, 2025

Doubly Robust Monte Carlo Tree Search

arXiv:2502.01672v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses sample efficiency for decision-making in complex environments like robotics or AI planning, though it appears incremental as it builds on existing MCTS and off-policy estimation methods.

The paper tackles the problem of sample inefficiency in Monte Carlo Tree Search (MCTS) by integrating Doubly Robust off-policy estimation, resulting in improved decision quality. In Tic-Tac-Toe, DR-MCTS achieved an 88% win rate versus 10% for standard MCTS, and in VirtualHome tasks, it attained a 20.7% success rate versus 10.3%.

We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS's potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes