LGMar 9, 2021

A Simple Approach for Non-stationary Linear Bandits

arXiv:2103.05324v299 citations
Originality Incremental advance
AI Analysis

This work corrects and improves theoretical guarantees for non-stationary linear bandits, offering a simpler method for researchers and practitioners in online learning.

The paper addresses non-stationary linear bandits by identifying a technical flaw in prior work and proposing a fix that yields an $\widetilde{\mathcal{O}}(T^{3/4}P_T^{1/4})$ dynamic regret, using a simple restarted UCB-type algorithm and achieving the same bound parameter-free via bandits-over-bandits.

This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Existing studies develop various algorithms and show that they enjoy an $\widetilde{\mathcal{O}}(T^{2/3}P_T^{1/3})$ dynamic regret, where $T$ is the time horizon and $P_T$ is the path-length that measures the fluctuation of the evolving unknown parameter. In this paper, we discover that a serious technical flaw makes their results ungrounded, and then present a fix, which gives an $\widetilde{\mathcal{O}}(T^{3/4}P_T^{1/4})$ dynamic regret without modifying original algorithms. Furthermore, we demonstrate that instead of using sophisticated mechanisms, such as sliding window or weighted penalty, a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration, and restart it periodically to handle the drift of unknown parameters. Our approach enjoys an $\widetilde{\mathcal{O}}(T^{3/4}P_T^{1/4})$ dynamic regret. Note that to achieve this bound, the algorithm requires an oracle knowledge of the path-length $P_T$. Combining the bandits-over-bandits mechanism by treating our algorithm as the base learner, we can further achieve the same regret bound in a parameter-free way. Empirical studies also validate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes