ML LGMay 20, 2019

Gradient Ascent for Active Exploration in Bandit Problems

arXiv:1905.08165v111.824 citations

Originality Highly original

AI Analysis

This work addresses the problem of efficient exploration in bandit settings for researchers and practitioners in machine learning, offering a computationally efficient solution with proven optimality.

The paper tackles the Active Exploration bandit problem in the fixed confidence setting, including Best Arm Identification and Thresholding Bandits, by proposing a new gradient ascent-based algorithm with a sampling rule using online lazy mirror ascent, and proves it is asymptotically optimal and computationally efficient.

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

View on arXiv PDF

Similar