Gradient Ascent for Active Exploration in Bandit Problems
This work addresses the problem of efficient exploration in bandit settings for researchers and practitioners in machine learning, offering a computationally efficient solution with proven optimality.
The paper tackles the Active Exploration bandit problem in the fixed confidence setting, including Best Arm Identification and Thresholding Bandits, by proposing a new gradient ascent-based algorithm with a sampling rule using online lazy mirror ascent, and proves it is asymptotically optimal and computationally efficient.
We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.