MLLGNov 14, 2023

Ensemble sampling for linear bandits: small ensembles suffice

arXiv:2311.08376v45 citationsh-index: 22
Originality Highly original
AI Analysis

This provides a more efficient algorithm for sequential decision-making in bandit problems, addressing a key bottleneck in ensemble methods.

The paper tackles the problem of ensemble sampling in stochastic linear bandits by showing that an ensemble size of order d log T suffices to achieve near-√T regret, which is the first result to avoid linear scaling with T and allow infinite action sets.

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d \log T$ incurs regret at most of the order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes