LGJan 17, 2024

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

DeepMindPrinceton
arXiv:2401.09278v1h-index: 64ICLR
Originality Highly original
AI Analysis

This work addresses the problem of rapid adaptation in online learning for researchers and practitioners dealing with fast-changing states, offering a significant improvement over prior lower bounds.

The paper tackles the challenge of online optimization in volatile environments by introducing a bandit algorithm that achieves optimal strongly adaptive regret with only two queries per round, specifically achieving $ ilde{O}(\sqrt{n|I|})$ regret for multi-armed bandits with $n$ arms, which is tight and cannot be improved.

Fast changing states or volatile environments pose a significant challenge to online optimization, which needs to perform rapid adaptation under limited observation. In this paper, we give query and regret optimal bandit algorithms under the strict notion of strongly adaptive regret, which measures the maximum regret over any contiguous interval $I$. Due to its worst-case nature, there is an almost-linear $Ω(|I|^{1-ε})$ regret lower bound, when only one query per round is allowed [Daniely el al, ICML 2015]. Surprisingly, with just two queries per round, we give Strongly Adaptive Bandit Learner (StABL) that achieves $\tilde{O}(\sqrt{n|I|})$ adaptive regret for multi-armed bandits with $n$ arms. The bound is tight and cannot be improved in general. Our algorithm leverages a multiplicative update scheme of varying stepsizes and a carefully chosen observation distribution to control the variance. Furthermore, we extend our results and provide optimal algorithms in the bandit convex optimization setting. Finally, we empirically demonstrate the superior performance of our algorithms under volatile environments and for downstream tasks, such as algorithm selection for hyperparameter optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes