Optimism Stabilizes Thompson Sampling for Adaptive Inference

arXiv:2602.06014v11 citations
Originality Highly original
AI Analysis

This resolves an open question in adaptive inference for bandit algorithms, enabling reliable statistical analysis in applications like clinical trials or recommendation systems.

The paper tackled the problem of ensuring valid asymptotic inference in Thompson sampling for multi-armed bandits by identifying optimism as a key mechanism to stabilize the algorithm, proving that variance-inflated and mean-bonus modifications achieve stability for any number of arms with only mild regret cost.

Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study this phenomenon in the $K$-armed Gaussian bandit and identify \emph{optimism} as a key mechanism for restoring \emph{stability}, a sufficient condition for valid asymptotic inference requiring each arm's pull count to concentrate around a deterministic scale. First, we prove that variance-inflated TS \citep{halder2025stable} is stable for any $K \ge 2$, including the challenging regime where multiple arms are optimal. This resolves the open question raised by \citet{halder2025stable} through extending their results from the two-armed setting to the general $K$-armed setting. Second, we analyze an alternative optimistic modification that keeps the posterior variance unchanged but adds an explicit mean bonus to posterior mean, and establish the same stability conclusion. In summary, suitably implemented optimism stabilizes Thompson sampling and enables asymptotically valid inference in multi-armed bandits, while incurring only a mild additional regret cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes