LGMLMar 5, 2024

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

arXiv:2403.03219v32 citationsh-index: 16AISTATS
Originality Incremental advance
AI Analysis

This work provides a robust algorithm for sequential decision-making in contextual bandits, applicable to scenarios like online advertising or recommendation systems, though it is incremental as it builds on existing FTRL and Tsallis entropy methods.

The paper tackles the linear contextual bandit problem by developing a Best-of-Both-Worlds algorithm that achieves logarithmic regret in stochastic regimes and square-root regret in adversarial regimes, with extensions to more general margin conditions.

We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $α$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $β\in (1, \infty]$, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+β}{2+β}}T^{\frac{1}{2+β}}\right)$ regret under the margin condition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes