Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits
This work addresses the problem of adapting to changing environments in bandit algorithms for researchers and practitioners, but it is incremental as it builds on existing WRLS-based methods with Bayesian enhancements.
The paper tackles non-stationary linear contextual bandits by proposing Weighted Sequential Bayesian (WSB) inference, which maintains a posterior distribution over time-varying parameters and introduces a novel concentration inequality with a prior-dependent term. It results in algorithms like WSB-LinUCB that match best-known guarantees and WSB-RandLinUCB and WSB-LinTS that improve upon them, while preserving computational efficiency.
We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.