AI LGMay 22

MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games

Qian-Rong Li, Hung Guei, I-Chen Wu, Ti-Rong Wu

arXiv:2605.241392.2

Predicted impact top 92% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers applying AlphaZero to imperfect-information games, MAPLE offers a more effective search method that balances performance and computational cost.

MAPLE proposes a tree search method for imperfect-information games that aggregates policy and value evaluations from multiple sampled world states, outperforming PIMC-based AlphaZero with Elo improvements of 291 in Phantom Go and 136 in Dark Hex.

Imperfect-information games (IIGs) are challenging, as players must make decisions without fully observing the true game state. While AlphaZero has achieved remarkable success in perfect-information games, extending it to IIGs remains difficult. Existing search-based approaches, such as Perfect Information Monte Carlo (PIMC), suffer from strategy fusion, while Information Set Monte Carlo Tree Search (IS-MCTS) incurs high computational cost when combined with neural networks. In this paper, we propose Multi-State Aggregated PoLicy Evaluation (MAPLE), a tree search method that aggregates policy and value evaluations from multiple sampled world states within a single search tree, combining the advantages of PIMC and IS-MCTS while maintaining a controllable computational cost. We further incorporate a Siamese-based sampling strategy to select informative world states from the information set. Experiments on Phantom Go and Dark Hex show that MAPLE significantly outperforms the PIMC-based AlphaZero baseline, achieving Elo improvements of 291 and 136, respectively. These results demonstrate that MAPLE is an effective approach for AlphaZero-style learning in imperfect-information games.

View on arXiv PDF

Similar