MLLGNov 11, 2020

Asymptotically Optimal Information-Directed Sampling

arXiv:2011.05944v439 citations
AI Analysis

This work addresses the challenge of balancing regret and information in bandit algorithms for researchers and practitioners in machine learning, though it is incremental as it builds on existing IDS and primal-dual frameworks.

The paper tackles the problem of designing an asymptotically optimal algorithm for stochastic linear bandits with finitely many actions, achieving both asymptotic optimality and near worst-case optimality in finite time. The result shows that the proposed information-directed sampling (IDS) algorithm is competitive with UCB in finite-time and can be significantly better asymptotically.

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes