LGMLFeb 23, 2023

A Definition of Non-Stationary Bandits

Stanford
arXiv:2302.12202v213 citationsh-index: 55
AI Analysis

This addresses a foundational problem in bandit learning for researchers, offering a more robust theoretical framework, though it is incremental as it refines definitions rather than introducing new methods.

The paper tackles the lack of a formal definition for non-stationary bandits, showing that existing definitions lead to ambiguities and flawed regret notions, and introduces a new definition that resolves these issues and provides a unified framework for both Bayesian and frequentist formulations.

Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary; this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. Moreover, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits. Additionally, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. Our new definition provides a unified approach, applicable seamlessly to both Bayesian and frequentist formulations of bandits. Furthermore, our definition ensures consistent classification of two bandits offering agents indistinguishable experiences, categorizing them as either both stationary or both non-stationary. This advancement provides a more robust framework for non-stationary bandit learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes