Fully adaptive algorithm for pure exploration in linear bandits
This addresses the challenge of efficiently identifying the best arm in linear bandit settings, which is incremental as it builds on prior work by introducing full adaptivity.
The paper tackles the problem of pure exploration in linear bandits by proposing the first fully-adaptive algorithm that adjusts arm selection based on past observations, showing it matches the lower bound in an extreme case and achieves vast improvement over existing methods in simulations.
We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing methods.