MLLGNov 27, 2018

Rotting bandits are not harder than stochastic ones

arXiv:1811.11043v24 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of non-stationary rewards in applications like recommendation systems, showing that rotting bandits are not inherently harder than stochastic ones, though it is an incremental improvement on existing methods.

The paper tackles the problem of rotting bandits, where rewards only decrease over time, by introducing the FEWA algorithm, which achieves a problem-dependent regret bound of O(log(KT)) and a problem-independent bound of O(sqrt(KT)), substantially improving over prior work and matching bounds for stochastic bandits.

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon $T$, and without any knowledge on the decreasing behavior of the $K$ arms, FEWA achieves problem-dependent regret bound of $\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of $\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret $\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes