Pessimistic Iterative Planning with RNNs for Robust POMDPs
This addresses robust decision-making under uncertainty in partially observable environments, such as robotics or autonomous systems, with incremental improvements over existing methods.
The paper tackles the problem of computing robust memory-based policies for robust POMDPs under model uncertainty by proposing the pessimistic iterative planning (PIP) framework with the rFSCNet algorithm, which uses recurrent neural networks to optimize finite-state controllers, resulting in better-performing policies than baselines and a state-of-the-art solver.
Robust POMDPs extend classical POMDPs to incorporate model uncertainty using so-called uncertainty sets on the transition and observation functions, effectively defining ranges of probabilities. Policies for robust POMDPs must be (1) memory-based to account for partial observability and (2) robust against model uncertainty to account for the worst-case probability instances from the uncertainty sets. To compute such robust memory-based policies, we propose the pessimistic iterative planning (PIP) framework, which alternates between (1) selecting pessimistic POMDPs via worst-case probability instances from the uncertainty sets, and (2) computing finite-state controllers (FSCs) for these pessimistic POMDPs. Within PIP, we propose the rFSCNet algorithm, which optimizes a recurrent neural network to compute the FSCs. The empirical evaluation shows that rFSCNet can compute better-performing robust policies than several baselines and a state-of-the-art robust POMDP solver.