MLOct 16, 2017

Fully adaptive algorithm for pure exploration in linear bandits

arXiv:1710.05552v192 citations
Originality Highly original
AI Analysis

This addresses the challenge of efficiently identifying the best arm in linear bandit settings, which is incremental as it builds on prior work by introducing full adaptivity.

The paper tackles the problem of pure exploration in linear bandits by proposing the first fully-adaptive algorithm that adjusts arm selection based on past observations, showing it matches the lower bound in an extreme case and achieves vast improvement over existing methods in simulations.

We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes