LGMLOct 23, 2020

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

arXiv:2010.12642v265 citations
Originality Highly original
AI Analysis

This work addresses the challenge of non-linearity in parametrized bandits for researchers and practitioners, offering significant theoretical improvements over prior methods.

The paper tackles the problem of logistic bandits by introducing a novel algorithm that improves regret guarantees, achieving a minimax-optimal upper-bound of $ ilde{\mathcal{O}}(d\sqrt{T/\kappa})$ in favorable cases, which dramatically improves over the previous state-of-the-art of $ ilde{\mathcal{O}}(d\sqrt{T}+\kappa)$.

Logistic Bandits have recently attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant $κ$, characterizing the magnitude of the reward's non-linearity. In this paper we introduce a novel algorithm for which we provide a refined analysis. This allows for a better characterization of the effect of non-linearity and yields improved problem-dependent guarantees. In most favorable cases this leads to a regret upper-bound scaling as $\tilde{\mathcal{O}}(d\sqrt{T/κ})$, which dramatically improves over the $\tilde{\mathcal{O}}(d\sqrt{T}+κ)$ state-of-the-art guarantees. We prove that this rate is minimax-optimal by deriving a $Ω(d\sqrt{T/κ})$ problem-dependent lower-bound. Our analysis identifies two regimes (permanent and transitory) of the regret, which ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of Dong et al. (2019). In contrast to previous works, we find that in the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off. While it also impacts the length of the transitory phase in a problem-dependent fashion, we show that this impact is mild in most reasonable configurations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes