AILOROSYJun 5, 2024

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

arXiv:2406.02871v13 citations
Originality Incremental advance
AI Analysis

This addresses a core problem in model checking and sequential decision-making under uncertainty, offering improved performance for domain-specific applications.

The paper tackles the Maximal Reachability Probability Problem in undiscounted POMDPs by extending trial-based heuristic search value iteration to handle indefinite-horizon scenarios, resulting in an algorithm that outperforms existing methods in probability guarantees and computation time on benchmarks.

Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes