LGNov 9, 2017

Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates

arXiv:1711.03591v138 citations
Originality Highly original
AI Analysis

This work addresses the efficiency of decision-making in sequential learning problems, such as online advertising or clinical trials, by providing an incremental improvement to UCB algorithms with theoretical and empirical gains.

The authors tackled the problem of minimizing cumulative regret in stochastic multi-armed bandits by proposing Efficient-UCB-Variance (EUCBV), which combines arm elimination with variance estimates, resulting in improved gap-dependent and gap-independent regret bounds over existing UCB variants and competitive performance against other algorithms like Thompson sampling.

We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting. EUCBV incorporates the arm elimination strategy proposed in UCB-Improved \citep{auer2010ucb}, while taking into account the variance estimates to compute the arms' confidence bounds, similar to UCBV \citep{audibert2009exploration}. Through a theoretical analysis we establish that EUCBV incurs a \emph{gap-dependent} regret bound of {\scriptsize $O\left( \dfrac{Kσ^2_{\max} \log (TΔ^2 /K)}Δ\right)$} after $T$ trials, where $Δ$ is the minimal gap between optimal and sub-optimal arms; the above bound is an improvement over that of existing state-of-the-art UCB algorithms (such as UCB1, UCB-Improved, UCBV, MOSS). Further, EUCBV incurs a \emph{gap-independent} regret bound of {\scriptsize $O\left(\sqrt{KT}\right)$} which is an improvement over that of UCB1, UCBV and UCB-Improved, while being comparable with that of MOSS and OCUCB. Through an extensive numerical study we show that EUCBV significantly outperforms the popular UCB variants (like MOSS, OCUCB, etc.) as well as Thompson sampling and Bayes-UCB algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes