LGSTMLMar 29, 2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

arXiv:1603.08661v210 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of minimising cumulative regret in bandit algorithms for researchers and practitioners, though it appears incremental as an extension of an existing algorithm.

The authors introduced an anytime version of the Optimally Confident UCB algorithm for finite-armed stochastic bandits with subgaussian noise, achieving the strongest finite-time regret guarantees for a horizon-free algorithm and providing a nearly matching lower bound.

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes