MLLGFeb 13, 2020

Multiscale Non-stationary Stochastic Bandits

arXiv:2002.05289v1
AI Analysis

This addresses the challenge of non-stationary environments in contextual bandits, which is crucial for applications like online advertising and recommendation systems, representing a novel method for a known bottleneck rather than an incremental improvement.

The paper tackles the problem of non-stationary linear bandits, where classic algorithms like LinUCB suffer linear regret due to changing reward distributions, and proposes Multiscale-LinUCB, a novel multiscale changepoint detection method that actively adapts to the environment, with experimental results showing it outperforms state-of-the-art algorithms.

Classic contextual bandit algorithms for linear models, such as LinUCB, assume that the reward distribution for an arm is modeled by a stationary linear regression. When the linear regression model is non-stationary over time, the regret of LinUCB can scale linearly with time. In this paper, we propose a novel multiscale changepoint detection method for the non-stationary linear bandit problems, called Multiscale-LinUCB, which actively adapts to the changing environment. We also provide theoretical analysis of regret bound for Multiscale-LinUCB algorithm. Experimental results show that our proposed Multiscale-LinUCB algorithm outperforms other state-of-the-art algorithms in non-stationary contextual environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes