LGSTNov 8, 2020

Asymptotic Convergence of Thompson Sampling

arXiv:2011.03917v17 citations
Originality Incremental advance
AI Analysis

This work addresses the theoretical understanding of Thompson sampling's long-term convergence for researchers in online learning and bandit problems, but it is incremental as it builds on prior finite-time analyses.

The paper tackles the underexplored asymptotic behavior of Thompson sampling by proving that under a sub-linear Bayesian regret assumption, the agent's actions provide a strongly consistent estimator of the optimal action, relying on the martingale structure of the method.

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad range of probabilistic settings. However its asymptotic behavior remains mostly underexplored. In this paper, we prove an asymptotic convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret, and show that the actions of a Thompson sampling agent provide a strongly consistent estimator of the optimal action. Our results rely on the martingale structure inherent in Thompson sampling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes