LGMLFeb 19, 2020

Bandits with Mean Bounds

arXiv:2002.08405v52 citations
AI Analysis

This work addresses the problem of improving exploration efficiency in bandit algorithms for researchers and practitioners, though it appears incremental as it builds on existing methods like OFUL and UCB.

The paper tackles the bandit problem by incorporating side information in the form of bounds on arm means to improve regret bounds, developing algorithms like R-OFUL and GLUE that achieve regret upper bounds never worse than standard methods.

We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes