AINov 13, 2023
Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and PracticeIdan Lev-Yehudi, Moran Barenboim, Vadim Indelman
Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.
AIMar 3, 2023
Data Association Aware POMDP Planning with Hypothesis Pruning Performance GuaranteesMoran Barenboim, Idan Lev-Yehudi, Vadim Indelman
Autonomous agents that operate in the real world must often deal with partial observability, which is commonly modeled as partially observable Markov decision processes (POMDPs). However, traditional POMDP models rely on the assumption of complete knowledge of the observation source, known as fully observable data association. To address this limitation, we propose a planning algorithm that maintains multiple data association hypotheses, represented as a belief mixture, where each component corresponds to a different data association hypothesis. However, this method can lead to an exponential growth in the number of hypotheses, resulting in significant computational overhead. To overcome this challenge, we introduce a pruning-based approach for planning with ambiguous data associations. Our key contribution is to derive bounds between the value function based on the complete set of hypotheses and the value function based on a pruned-subset of the hypotheses, enabling us to establish a trade-off between computational efficiency and performance. We demonstrate how these bounds can both be used to certify any pruning heuristic in retrospect and propose a novel approach to determine which hypotheses to prune in order to ensure a predefined limit on the loss. We evaluate our approach in simulated environments and demonstrate its efficacy in handling multi-modal belief hypotheses with ambiguous data associations.
AISep 12, 2025
Online Robust Planning under Model Uncertainty: A Sample-Based ApproachTamir Shazman, Idan Lev-Yehudi, Ron Benchetit et al.
Online planning in Markov Decision Processes (MDPs) enables agents to make sequential decisions by simulating future trajectories from the current state, making it well-suited for large-scale or dynamic environments. Sample-based methods such as Sparse Sampling and Monte Carlo Tree Search (MCTS) are widely adopted for their ability to approximate optimal actions using a generative model. However, in practical settings, the generative model is often learned from limited data, introducing approximation errors that can degrade performance or lead to unsafe behaviors. To address these challenges, Robust MDPs (RMDPs) offer a principled framework for planning under model uncertainty, yet existing approaches are typically computationally intensive and not suited for real-time use. In this work, we introduce Robust Sparse Sampling (RSS), the first online planning algorithm for RMDPs with finite-sample theoretical performance guarantees. Unlike Sparse Sampling, which estimates the nominal value function, RSS computes a robust value function by leveraging the efficiency and theoretical properties of Sample Average Approximation (SAA), enabling tractable robust policy computation in online settings. RSS is applicable to infinite or continuous state spaces, and its sample and computational complexities are independent of the state space size. We provide theoretical performance guarantees and empirically show that RSS outperforms standard Sparse Sampling in environments with uncertain dynamics.
AIMar 15, 2025
Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPsIdan Lev-Yehudi, Michael Novitsky, Moran Barenboim et al.
Autonomous systems that operate in continuous state, action, and observation spaces require planning and reasoning under uncertainty. Existing online planning methods for such POMDPs are almost exclusively sample-based, yet they forego the power of high-dimensional gradient optimization as combining it into Monte Carlo Tree Search (MCTS) has proved difficult, especially in non-parametric settings. We close this gap with three contributions. First, we derive a novel action-gradient theorem for both MDPs and POMDPs in terms of transition likelihoods, making gradient information accessible during tree search. Second, we introduce the Multiple Importance Sampling (MIS) tree, that re-uses samples for changing action branches, yielding consistent value estimates that enable in-search gradient steps. Third, we derive exact transition probability computation via the area formula for smooth generative models common in physical domains, a result of independent interest. These elements combine into Action-Gradient Monte Carlo Tree Search (AGMCTS), the first planner to blend non-parametric particle search with online gradient refinement in POMDPs. Across several challenging continuous MDP and POMDP benchmarks, AGMCTS outperforms widely-used sample-only solvers in solution quality.
AIFeb 4, 2025
Anytime Incremental $ρ$POMDP Planning in Continuous SpacesRon Benchetrit, Idan Lev-Yehudi, Andrey Zhitnikov et al.
Partially Observable Markov Decision Processes (POMDPs) provide a robust framework for decision-making under uncertainty in applications such as autonomous driving and robotic exploration. Their extension, $ρ$POMDPs, introduces belief-dependent rewards, enabling explicit reasoning about uncertainty. Existing online $ρ$POMDP solvers for continuous spaces rely on fixed belief representations, limiting adaptability and refinement - critical for tasks such as information-gathering. We present $ρ$POMCPOW, an anytime solver that dynamically refines belief representations, with formal guarantees of improvement over time. To mitigate the high computational cost of updating belief-dependent rewards, we propose a novel incremental computation approach. We demonstrate its effectiveness for common entropy estimators, reducing computational cost by orders of magnitude. Experimental results show that $ρ$POMCPOW outperforms state-of-the-art solvers in both efficiency and solution quality.