LGFeb 16, 2023

Linear Bandits with Memory: from Rotting to Rising

arXiv:2302.08345v23 citationsh-index: 15
AI Analysis

This addresses the challenge of modeling satiation effects in recommendations for practitioners using linear bandits, offering a novel approach with theoretical guarantees, though it is incremental in extending existing bandit frameworks.

The paper tackles the problem of nonstationary phenomena in linear bandits by introducing a model where current rewards depend on past actions in a fixed window, capturing rotting or rising effects, and achieves a regret bound of order sqrt(d)*(m+1)^(1/2+max{γ,0})*T^(3/4) (ignoring logs) against the optimal action sequence.

Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. However, the richer action space provided by linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner's past actions in a fixed-size window. Our model, which recovers stationary linear bandits as a special case, leverages two parameters: the window size $m \ge 0$, and an exponent $γ$ that captures the rotting ($γ< 0)$ or rising ($γ> 0$) nature of the phenomenon. When both $m$ and $γ$ are known, we propose and analyze a variant of OFUL which minimizes regret against cycling policies. By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order $\sqrt{d}\,(m+1)^{\frac{1}{2}+\max\{γ,0\}}\,T^{3/4}$ (ignoring log factors) on the regret against the optimal sequence of actions, where $T$ is the horizon and $d$ is the dimension of the linear action space. Through a bandit model selection approach, our results are extended to the case where $m$ and $γ$ are unknown. Finally, we complement our theoretical results with experiments against natural baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes