8.2AIApr 22
Computing the Reachability Value of Posterior-Deterministic POMDPsNathanaël Fijalkow, Arka Ghosh, Roman Kniazev et al.
Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.
AIDec 16, 2024
Revelations: A Decidable Class of POMDPs with Omega-Regular ObjectivesMarius Belly, Nathanaël Fijalkow, Hugo Gimbert et al.
Partially observable Markov decision processes (POMDPs) form a prominent model for uncertainty in sequential decision making. We are interested in constructing algorithms with theoretical guarantees to determine whether the agent has a strategy ensuring a given specification with probability 1. This well-studied problem is known to be undecidable already for very simple omega-regular objectives, because of the difficulty of reasoning on uncertain events. We introduce a revelation mechanism which restricts information loss by requiring that almost surely the agent has eventually full information of the current state. Our main technical results are to construct exact algorithms for two classes of POMDPs called weakly and strongly revealing. Importantly, the decidable cases reduce to the analysis of a finite belief-support Markov decision process. This yields a conceptually simple and exact algorithm for a large class of POMDPs.
AISep 29, 2025
LTL$_f$ Learning Meets Boolean Set CoverGabriel Bathie, Nathanaël Fijalkow, Théo Matricon et al.
Learning formulas in Linear Temporal Logic (LTLf) from finite traces is a fundamental research problem which has found applications in artificial intelligence, software engineering, programming languages, formal methods, control of cyber-physical systems, and robotics. We implement a new CPU tool called Bolt improving over the state of the art by learning formulas more than 100x faster over 70% of the benchmarks, with smaller or equal formulas in 98% of the cases. Our key insight is to leverage a problem called Boolean Set Cover as a subroutine to combine existing formulas using Boolean connectives. Thanks to the Boolean Set Cover component, our approach offers a novel trade-off between efficiency and formula size.