AIROOct 15, 2023

Recursively-Constrained Partially Observable Markov Decision Processes

arXiv:2310.09688v33 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses safety-critical sequential decision problems with constraints under uncertainty, offering a novel framework to improve policy reliability, though it is incremental as it builds on existing C-POMDP models.

The authors tackled the issue of constrained Partially Observable Markov Decision Processes (C-POMDPs) violating optimal substructure, leading to undesirable behaviors and ineffective online re-planning, by introducing Recursively-Constrained POMDPs (RC-POMDPs) with history-dependent constraints, which ensure deterministic optimal policies and adherence to Bellman's principle, and demonstrated efficacy on benchmark problems with more desirable behaviors than C-POMDPs.

Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes