MLLGJul 1, 2015

Bootstrapped Thompson Sampling and Deep Exploration

arXiv:1507.00300v1113 citations
Originality Incremental advance
AI Analysis

This addresses a computational bottleneck for researchers and practitioners using Thompson sampling in deep learning contexts, though it appears incremental as it builds on existing Thompson sampling methods.

The paper tackles the computational infeasibility of maintaining or sampling from posterior distributions in Thompson sampling for exploration, especially with deep learning, by introducing a bootstrap-based approach that uses observed and artificially generated data to induce a prior, enabling effective exploration in multi-armed bandit and reinforcement learning problems.

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes