LGMLSep 19, 2019

Weighted Linear Bandits for Non-Stationary Environments

arXiv:1909.09146v2123 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of adapting to changing environments in online decision-making for applications like recommendation systems, though it is incremental as it builds on existing linear bandit frameworks.

The paper tackles the problem of non-stationary linear bandits, where the unknown regression parameter varies over time, by proposing D-LinUCB, an optimistic algorithm based on discounted linear regression with exponential weights to forget past data. It provides theoretical guarantees, including an optimal dynamic regret bound of order d^{2/3} B_T^{1/3} T^{2/3}, and demonstrates empirical performance in simulations.

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d^{2/3} B\_T^{1/3}T^{2/3}, where B\_T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes