Unifying Clustered and Non-stationary Bandits
This work addresses the challenge of integrating two independent strands of bandit research for improved adaptability in real-world scenarios, representing an incremental advancement.
The paper tackles the problem of unifying non-stationary bandits and online clustering of bandits by proposing a solution based on a test of homogeneity, which handles change detection and cluster identification in a single framework, with rigorous regret analysis and empirical evaluations showing its flexibility.
Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits and provide solutions to many important real-world scenarios. Though the essence in solving these two problems overlaps considerably, they have been studied independently. In this paper, we connect these two strands of bandit research under the notion of test of homogeneity, which seamlessly addresses change detection for non-stationary bandit and cluster identification for online clustering of bandit in a unified solution framework. Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution, especially its flexibility in handling various environment assumptions.