LG ROJul 7, 2024

Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

arXiv:2407.05511v14.61 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses exploration inefficiencies in MCTS for domains like robotics, offering a novel approach that bridges count-based exploration and sampling-based planning, though it appears incremental in its methodological advancement.

The paper tackled the challenge of long-horizon exploration in Monte Carlo Tree Search (MCTS) by developing Volume-MCTS, a tree search algorithm based on policy optimization with state occupancy regularization, which outperformed AlphaZero in robot navigation tasks with significantly better exploration properties.

Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call {\it Volume-MCTS}. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.

View on arXiv PDF

Similar