AIApr 25, 2024

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

arXiv:2404.16364v52 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency issues for researchers and practitioners using MCTS-based algorithms in decision-making domains, though it is incremental as it builds on existing methods like MuZero.

The authors tackled the high computational cost of reanalyzing stale data in MCTS-based algorithms like MuZero by proposing ReZero, which uses backward-view reuse and entire-buffer reanalysis to reduce search time while maintaining or improving performance, achieving substantial speed improvements in Atari, DMControl, and board game experiments.

Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which uses the value estimation of a certain child node to save the corresponding sub-tree search time. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments, DMControl suites and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero MCTS benchmark at https://github.com/opendilab/LightZero.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes