An Online Learning Approach to Optimizing Time-Varying Costs of AoI
This work addresses the challenge of unknown and adversarial costs in real-time monitoring systems, offering incremental improvements to online learning for restless multi-armed bandits.
The paper tackles the problem of optimizing time-varying costs for timely monitoring in communication networks by designing online learning algorithms that achieve sublinear regret compared to fixed and dynamic policies, with applications in mobility tracking showing performance benefits over oblivious scheduling.
We consider systems that require timely monitoring of sources over a communication network, where the cost of delayed information is unknown, time-varying and possibly adversarial. For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight. For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader and show that it has low regret compared to the best fixed scheduling policy in hindsight, while remaining computationally feasible. The algorithm and its regret analysis are novel and of independent interest to the study of online restless multi-armed bandit problems. We further design algorithms that achieve sublinear regret compared to the best dynamic policy when the environment is slowly varying. Finally, we apply our algorithms to a mobility tracking problem. We consider non-stationary and adversarial mobility models and illustrate the performance benefit of using our online learning algorithms compared to an oblivious scheduling policy.