MLLGMay 24, 2025

Non-Stationary Lipschitz Bandits

arXiv:2505.18871v21 citationsh-index: 15
Originality Highly original
AI Analysis

This provides the first optimal guarantee for non-stationary Lipschitz bandits, addressing a fundamental challenge in online decision-making under changing environments.

The paper tackles the problem of non-stationary Lipschitz bandits with infinite actions and arbitrarily changing reward functions, achieving a minimax-optimal dynamic regret bound of O~(L^{1/3}T^{2/3}) where L is the number of significant shifts and T is the horizon.

We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $\mathcal{\widetilde{O}}(\tilde{L}^{1/3}T^{2/3})$, where $\tilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes