LGOCJul 28, 2015

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

arXiv:1507.07880v348 citations
Originality Highly original
AI Analysis

This work addresses a key theoretical and practical challenge in multi-armed bandit algorithms for decision-making under uncertainty.

The paper tackles the problem of achieving both optimal problem-dependent and worst-case regret in stochastic finite-armed bandits, resulting in a simple and efficient algorithm that empirically performs well.

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and empirically superb. The approach is based on UCB, but with a carefully chosen confidence parameter that optimally balances the risk of failing confidence intervals against the cost of excessive optimism.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes