LGSTMay 19, 2021

Diffusion Approximations for Thompson Sampling in the Small Gap Regime

arXiv:2105.09232v522 citations
Originality Incremental advance
AI Analysis

This provides theoretical insights for researchers in bandit algorithms, though it is incremental as it extends existing convergence results to a specific regime.

The paper tackles the analysis of Thompson sampling and other sampling-based bandit algorithms in the small gap regime, showing that their process-level dynamics converge to stochastic differential equations and that their regret performance is insensitive to model mis-specification.

We study the process-level dynamics of Thompson sampling in the ``small gap'' regime. The small gap regime is one in which the gaps between the arm means are of order $\sqrtγ$ or smaller and the time horizon is of order $1/γ$, where $γ$ is small. As $γ\downarrow 0$, we show that the process-level dynamics of Thompson sampling converge weakly to the solutions to certain stochastic differential equations and stochastic ordinary differential equations. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem, can handle stationary, weakly dependent reward processes, and can also be adapted to analyze a variety of sampling-based bandit algorithms. Indeed, we show that the process-level dynamics of many sampling-based bandit algorithms -- including Thompson sampling designed for any single-parameter exponential family of rewards, as well as non-parametric bandit algorithms based on bootstrap re-sampling -- satisfy an invariance principle. Namely, their weak limits coincide with that of Gaussian parametric Thompson sampling with Gaussian priors. Moreover, in the small gap regime, the regret performance of these algorithms is generally insensitive to model mis-specification, changing continuously with increasing degrees of mis-specification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes