ML LGJun 25, 2019

Restless dependent bandits with fading memory

Oleksandr Zadorozhnyi, Gilles Blanchard, Alexandra Carpentier

arXiv:1906.10454v11.2

Originality Incremental advance

AI Analysis

This work addresses the problem of bandit optimization under temporal dependencies for researchers in machine learning and statistics, providing theoretical insights into regret bounds in mixing scenarios.

The paper tackles the stochastic multi-armed bandit problem with time-dependent arm samples from weak C-mixing processes, showing that in fast-mixing scenarios, pseudo-regret bounds match those for i.i.d. observations, while in slow-mixing scenarios, an additive term independent of the number of arms appears, with a minmax lower bound matching the upper bound up to a log(T) factor.

We study the stochastic multi-armed bandit problem in the case when the arm samples are dependent over time and generated from so-called weak $\cC$-mixing processes. We establish a $\cC-$Mix Improved UCB agorithm and provide both problem-dependent and independent regret analysis in two different scenarios. In the first, so-called fast-mixing scenario, we show that pseudo-regret enjoys the same upper bound (up to a factor) as for i.i.d. observations; whereas in the second, slow mixing scenario, we discover a surprising effect, that the regret upper bound is similar to the independent case, with an incremental {\em additive} term which does not depend on the number of arms. The analysis of slow mixing scenario is supported with a minmax lower bound, which (up to a $\log(T)$ factor) matches the obtained upper bound.

View on arXiv PDF

Similar