MLLGPRMar 6, 2025

A characterization of sample adaptivity in UCB data

arXiv:2503.04855v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This provides theoretical insights into bandit algorithms, but it is incremental as it builds on existing UCB analysis.

The paper tackles the problem of characterizing sample adaptivity in UCB algorithms for stochastic two-armed bandits, deriving a joint central limit theorem for the number of pulls and sample mean rewards, with implications for pseudo-regret and sample bias.

We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number of pulls hence pseudo-regret that smoothly interpolates between a standard form in the large arm gap regime and a slow-concentration form in the small arm gap regime, and (2) a heuristic derivation of the sample bias up to its leading order from the correlation between the number of pulls and sample means. Our analysis framework is based on a novel perturbation analysis, which is of broader interest on its own.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes