ML LGFeb 23, 2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

arXiv:1802.08380v213.942 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of adapting bandit algorithms to changing environments, which is incremental as it builds on existing methods for non-stationary settings.

The paper tackles non-stationary multiarmed bandit problems by proposing LM-DSEE and SW-UCB# algorithms, showing that their expected cumulative regret is sublinear over time, with numerical results supporting the analysis.

We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms under either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.

View on arXiv PDF

Similar