ROAIApr 13

3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

arXiv:2604.1130240.7h-index: 4
Predicted impact top 55% in RO · last 90 daysOriginality Incremental advance
AI Analysis

For robotic manipulation tasks requiring spatial memory, this work provides a significant improvement over reactive baselines by enabling persistent reasoning about occluded objects.

3D-ALP combines MCTS with a 3D-consistent world model to enable persistent spatial memory for robotic manipulation, achieving 0.650 success on memory-required steps versus 0.006 for a greedy baseline (Δ=+0.645) and 0.822 on step 5 versus 0.000.

We present 3D-Anchored Lookahead Planning (3D-ALP), a System 2 reasoning engine for robotic manipulation that combines Monte Carlo Tree Search (MCTS) with a 3D-consistent world model as the rollout oracle. Unlike reactive policies that evaluate actions from the current camera frame only, 3D-ALP maintains a persistent camera-to-world (c2w) anchor that survives occlusion, enabling accurate replanning to object positions that are no longer directly observable. On a 5-step sequential reach task requiring spatial memory (Experiment E3), 3D-ALP achieves 0.650 0.109 success rate on memory-required steps versus 0.006 0.008 for a greedy reactive baseline (Δ=+0.645), while step 5 success reaches 0.822 against 0.000 for greedy. An ablation study (30 episodes, 3 seeds) isolates tree search spatial memory as the primary driver (+0.533, 82% of gain) with additional benefit from deeper lookahead (+0.111, 17%). We also identify and resolve four structural failure modes in applying UCT-MCTS (Upper Confidence Bounds applied to Trees [10]) to continuous robotic manipulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes