MLLGMEOct 30, 2019

Thompson Sampling via Local Uncertainty

arXiv:1910.13673v321 citations
Originality Incremental advance
AI Analysis

This work addresses the exploration-exploitation dilemma in contextual bandits, offering an incremental improvement over existing methods by focusing on local uncertainty.

The paper tackles the problem of improving Thompson sampling for sequential decision making by proposing a new probabilistic modeling framework that uses local latent variable uncertainty to sample mean rewards, achieving state-of-the-art performance on eight contextual bandit benchmark datasets with low computational complexity.

Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to address the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural networks into Thompson sampling. Most of these methods rely on global variable uncertainty for exploration. In this paper, we propose a new probabilistic modeling framework for Thompson sampling, where local latent variable uncertainty is used to sample the mean reward. Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness. Our experimental results on eight contextual bandit benchmark datasets show that Thompson sampling guided by local uncertainty achieves state-of-the-art performance while having low computational complexity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes