LGNov 2, 2021

Nonstochastic Bandits and Experts with Arm-Dependent Delays

Dirk van der Hoeven, Nicolò Cesa-Bianchi

arXiv:2111.01589v213.620 citations

Originality Incremental advance

AI Analysis

This work addresses delays in online learning for applications like recommendation systems, offering incremental improvements by extending existing frameworks to handle arm-dependent delays.

The paper tackles the problem of nonstochastic bandits and experts with arm-dependent delays, designing algorithms that achieve first-order regret bounds dependent only on the best arm's losses and delays, with results including bounds for full-information and bandit settings.

We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. While the setting in which delays only depend on time has been extensively studied, the arm-dependent delay setting better captures real-world applications at the cost of introducing new technical challenges. In the full information (experts) setting, we design an algorithm with a first-order regret bound that reveals an interesting trade-off between delays and losses. We prove a similar first-order regret bound also for the bandit setting, when the learner is allowed to observe how many losses are missing. These are the first bounds in the delayed setting that depend on the losses and delays of the best arm only. When in the bandit setting no information other than the losses is observed, we still manage to prove a regret bound through a modification to the algorithm of Zimmert and Seldin (2020). Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.

View on arXiv PDF

Similar