Fixed-Budget Change Point Identification in Piecewise Constant Bandits
This addresses a specific challenge in bandit algorithms for change point detection, offering incremental improvements in theoretical guarantees and practical performance.
The paper tackles the problem of locating a change point in piecewise constant bandits with a fixed exploration budget, providing the first non-asymptotic analysis and developing algorithms with near-optimal error bounds for both small and large budget regimes, including a regime-adaptive algorithm validated by experiments.
We study the piecewise constant bandit problem where the expected reward is a piecewise constant function with one change point (discontinuity) across the action space $[0,1]$ and the learner's aim is to locate the change point. Under the assumption of a fixed exploration budget, we provide the first non-asymptotic analysis of policies designed to locate abrupt changes in the mean reward function under bandit feedback. We study the problem under a large and small budget regime, and for both settings establish lower bounds on the error probability and provide algorithms with near matching upper bounds. Interestingly, our results show a separation in the complexity of the two regimes. We then propose a regime adaptive algorithm which is near optimal for both small and large budgets simultaneously. We complement our theoretical analysis with experimental results in simulated environments to support our findings.