Multi-Environment POMDPs with Finite-Horizon Objectives
For researchers in planning under uncertainty, this work extends complexity results and offers a more efficient algorithm for a class of POMDPs with adversarial initial states.
The paper studies multi-environment POMDPs with finite-horizon objectives, proving PSPACE-completeness and presenting a practical algorithm that outperforms prior work on benchmarks.
Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.