MLLGMay 20, 2019

Gradient Ascent for Active Exploration in Bandit Problems

arXiv:1905.08165v124 citations
Originality Highly original
AI Analysis

This work addresses the problem of efficient exploration in bandit settings for researchers and practitioners in machine learning, offering a computationally efficient solution with proven optimality.

The paper tackles the Active Exploration bandit problem in the fixed confidence setting, including Best Arm Identification and Thresholding Bandits, by proposing a new gradient ascent-based algorithm with a sampling rule using online lazy mirror ascent, and proves it is asymptotically optimal and computationally efficient.

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes