ML LG STDec 12, 2024

Precise Asymptotics and Refined Regret of Variance-Aware UCB

Yingying Fan, Yuxuan Han, Jinchi Lv, Xiaocong Xu, Zhengyuan Zhou

arXiv:2412.08843v212.03 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses the theoretical understanding of variance-aware algorithms in online decision-making, offering incremental improvements to regret analysis.

The paper analyzes the Upper Confidence Bound-Variance (UCB-V) algorithm for Multi-Armed Bandit problems, showing that its arm-pulling rates can be asymptotically unstable and providing non-asymptotic bounds, leading to a refined regret bound.

In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates into its decision-making process. More precisely, we provide an asymptotic characterization of the arm-pulling rates for UCB-V, extending recent results for the canonical UCB in Kalvit and Zeevi (2021) and Khamaru and Zhang (2024). In an interesting contrast to the canonical UCB, our analysis reveals that the behavior of UCB-V can exhibit instability, meaning that the arm-pulling rates may not always be asymptotically deterministic. Besides the asymptotic characterization, we also provide non-asymptotic bounds for the arm-pulling rates in the high probability regime, offering insights into the regret analysis. As an application of this high probability result, we establish that UCB-V can achieve a more refined regret bound, previously unknown even for more complicate and advanced variance-aware online decision-making algorithms.

View on arXiv PDF

Similar