MLLGSTFeb 5, 2019

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

arXiv:1902.01575v251 citations
AI Analysis

This addresses the challenge of non-stationary environments in bandit algorithms for applications like online advertising, though it is incremental as it builds on existing change-point detection methods.

The paper tackles the piecewise-stationary bandit problem by introducing GLR-klUCB, an algorithm that combines kl-UCB with a parameter-free change-point detector, achieving a regret bound of O(√(TAΥ_T log(T))) without prior knowledge of change-points and demonstrating practical efficiency in numerical studies.

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a $O(\sqrt{TA Υ_T\log(T)})$ regret in $T$ rounds on some "easy" instances, where A is the number of arms and $Υ_T$ the number of change-points, without prior knowledge of $Υ_T$. In contrast with recently proposed algorithms that are agnostic to $Υ_T$, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes