AIROSYMar 15, 2016

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

arXiv:1603.04586v18 citations
Originality Incremental advance
AI Analysis

This work addresses information maximization under constraints for researchers in stochastic processes and decision-making, but it appears incremental as it builds on existing multi-armed bandit relaxations.

The paper tackled sequential decision-making under uncertainty in mixed observability domains by deriving an upper bound for the optimal value function through constraint relaxation, which enabled effective pruning of the search space in simulation experiments, such as a target monitoring domain.

Sequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a multi-armed bandit whose optimal policy is easily computable. The upper bound is applied to prune the search space in the original problem, and the effect on solution quality is assessed via simulation experiments. Empirical results show effective pruning of the search space in a target monitoring domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes