AIOct 27, 2025

AUPO -- Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm

arXiv:2510.23214v13 citationsh-index: 5
Originality Incremental advance
AI Analysis

This is an incremental improvement for AI planning and decision-making systems, addressing limitations in existing abstraction methods.

The paper tackled the problem of improving Monte Carlo Tree Search (MCTS) by introducing AUPO, a reward distribution-based action abstraction algorithm, which outperformed MCTS on IPPC benchmark problems.

We introduce a novel, drop-in modification to Monte Carlo Tree Search's (MCTS) decision policy that we call AUPO. Comparisons based on a range of IPPC benchmark problems show that AUPO clearly outperforms MCTS. AUPO is an automatic action abstraction algorithm that solely relies on reward distribution statistics acquired during the MCTS. Thus, unlike other automatic abstraction algorithms, AUPO requires neither access to transition probabilities nor does AUPO require a directed acyclic search graph to build its abstraction, allowing AUPO to detect symmetric actions that state-of-the-art frameworks like ASAP struggle with when the resulting symmetric states are far apart in state space. Furthermore, as AUPO only affects the decision policy, it is not mutually exclusive with other abstraction techniques that only affect the tree search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes