Value-Directed Sampling Methods for POMDPs
This work addresses the challenge of efficient decision-making in partially observable environments for AI systems, representing an incremental improvement by adapting existing particle filtering methods to POMDP contexts.
The paper tackles the problem of approximate belief-state monitoring in POMDPs using particle filtering, deriving error bounds on decision quality with importance sampling and proposing an adaptive procedure to determine sample numbers for specific error bounds, with empirical evidence showing it directs sampling effectively to distinguish policies.
We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP). While particle filtering has become a widely-used tool in AI for monitoring dynamical systems, rather scant attention has been paid to their use in the context of decision making. Assuming the existence of a value function, we derive error bounds on decision quality associated with filtering using importance sampling. We also describe an adaptive procedure that can be used to dynamically determine the number of samples required to meet specific error bounds. Empirical evidence is offered supporting this technique as a profitable means of directing sampling effort where it is needed to distinguish policies.