AIROMar 15, 2025

Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs

arXiv:2503.12181v3h-index: 4
Originality Highly original
AI Analysis

This addresses a bottleneck in autonomous systems for continuous spaces, offering a novel method that is incremental but impactful for specific domains.

The authors tackled the problem of online planning in continuous POMDPs by developing AGMCTS, which integrates gradient optimization into Monte Carlo Tree Search, resulting in improved solution quality over sample-only solvers in benchmarks.

Autonomous systems that operate in continuous state, action, and observation spaces require planning and reasoning under uncertainty. Existing online planning methods for such POMDPs are almost exclusively sample-based, yet they forego the power of high-dimensional gradient optimization as combining it into Monte Carlo Tree Search (MCTS) has proved difficult, especially in non-parametric settings. We close this gap with three contributions. First, we derive a novel action-gradient theorem for both MDPs and POMDPs in terms of transition likelihoods, making gradient information accessible during tree search. Second, we introduce the Multiple Importance Sampling (MIS) tree, that re-uses samples for changing action branches, yielding consistent value estimates that enable in-search gradient steps. Third, we derive exact transition probability computation via the area formula for smooth generative models common in physical domains, a result of independent interest. These elements combine into Action-Gradient Monte Carlo Tree Search (AGMCTS), the first planner to blend non-parametric particle search with online gradient refinement in POMDPs. Across several challenging continuous MDP and POMDP benchmarks, AGMCTS outperforms widely-used sample-only solvers in solution quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes