MLLGNov 3, 2013

Thompson Sampling for Online Learning with Linear Experts

arXiv:1311.0468v12 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for online learning researchers, adapting Thompson sampling to a known setting with theoretical guarantees.

The paper tackles the problem of online linear generalization with full information by presenting a Thompson sampling algorithm, showing it reduces to Follow-the-Perturbed-Leader with Gaussian noise and achieves sqrt(T) regret bounds.

In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i.e., the experts setting), studied by Kalai and Vempala, 2005. The algorithm uses a Gaussian prior and time-varying Gaussian likelihoods, and we show that it essentially reduces to Kalai and Vempala's Follow-the-Perturbed-Leader strategy, with exponentially distributed noise replaced by Gaussian noise. This implies sqrt(T) regret bounds for Thompson sampling (with time-varying likelihood) for online learning with full information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes