AI LGApr 11, 2024

Monte Carlo Tree Search with Boltzmann Exploration

Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda

Oxford

arXiv:2404.07732v114.015 citationsh-index: 19Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses a limitation in automated planning for AI systems, particularly in games like Go, but is incremental as it builds on existing MCTS methods.

The paper tackled the problem that Maximum ENtropy Tree-Search (MENTS) in Monte Carlo Tree Search may not align optimal actions with the original objective, introducing Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS) to address this while maintaining benefits like efficient sampling, with empirical results showing consistent high performance in benchmarks like Go.

Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.

View on arXiv PDF Code

Similar