AIMay 17, 2024

What should be observed for optimal reward in POMDPs?

Alyzia-Maria Konsta, Alberto Lluch Lafuente, Christoph Matheja

arXiv:2405.10768v25.83 citationsh-index: 4Has CodeCAV

Originality Incremental advance

AI Analysis

This addresses a novel problem for system designers in AI and robotics who need to cost-effectively select sensors for agents in uncertain environments, though it is incremental as it builds on existing POMDP frameworks.

The paper tackles the optimal observability problem in POMDPs, determining how to adjust observation capabilities within a budget to keep expected reward below a threshold, showing it is undecidable in general but decidable for positional strategies and presenting algorithms with promising results on standard examples.

Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.

View on arXiv PDF Code

Similar