LG AIAug 22, 2024

Identifying the Best Arm in the Presence of Global Environment Shifts

Phurinut Srisawad, Juergen Branke, Long Tran-Thanh

arXiv:2408.12581v12.6h-index: 1

Originality Incremental advance

AI Analysis

This addresses a specific challenge in bandit algorithms for dynamic environments, though it is incremental as it builds on known settings like adversarial or corrupted bandits.

The paper tackles the problem of identifying the best arm in non-stationary bandits with global environmental shifts, developing a novel selection and allocation policy (LinLUCB) that significantly improves over existing methods in empirical tests.

This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.

View on arXiv PDF

Similar