LGAIMar 24, 2015

A Note on Information-Directed Sampling and Thompson Sampling

arXiv:1503.06902v11 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental note that clarifies existing algorithms for researchers in bandit problems.

The paper explains and compares three Bayesian multi-armed bandit algorithms—Information-Directed Sampling, Thompson Sampling, and Generalized Thompson Sampling—by providing intuitive explanations and derivations for their regret bounds.

This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes