LGMar 21, 2014

An Information-Theoretic Analysis of Thompson Sampling

arXiv:1403.5341v2462 citations

Originality Incremental advance

AI Analysis

This work offers a theoretical improvement for researchers in online learning and bandit problems, though it appears incremental as it builds on preexisting analysis.

The paper tackles the problem of analyzing Thompson sampling in online optimization with partial feedback, providing an information-theoretic approach that leads to regret bounds scaling with the entropy of the optimal-action distribution, strengthening existing results.

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.

View on arXiv PDF

Similar