SY SYMay 17, 2017

Utility Maximizing Sequential Sensing Over a Finite Horizon

Lorenzo Ferrari, Qing Zhao, Anna Scaglione

arXiv:1705.0596010 citationsh-index: 58

AI Analysis

For researchers in sequential decision-making, this provides a tractable solution to a resource allocation problem with applications in spectrum access and marketing.

The paper addresses optimal sequential sensing and exploitation of multiple binary-state resources over a finite horizon, formulating it as a POMDP. The proposed low-complexity policy achieves near-optimal performance in simulations.

We consider the problem of optimally utilizing $N$ resources, each in an unknown binary state. The state of each resource can be inferred from state-dependent noisy measurements. Depending on its state, utilizing a resource results in either a reward or a penalty per unit time. The objective is a sequential strategy governing the decision of sensing and exploitation at each time to maximize the expected utility (i.e., total reward minus total penalty and sensing cost) over a finite horizon $L$. We formulate the problem as a Partially Observable Markov Decision Process (POMDP) and show that the optimal strategy is based on two time-varying thresholds for each resource and an optimal selection rule for which resource to sense. Since a full characterization of the optimal strategy is generally intractable, we develop a low-complexity policy that is shown by simulations to offer near optimal performance. This problem finds applications in opportunistic spectrum access, marketing strategies and other sequential resource allocation problems.

View on arXiv PDF

Similar