LGFeb 15, 2023

Best Arm Identification for Stochastic Rising Bandits

Marco Mussi, Alessandro Montenegro, Francesco Trovó, Marcello Restelli, Alberto Maria Metelli

arXiv:2302.07510v38.86 citationsh-index: 38Has Code

Originality Incremental advance

AI Analysis

This addresses sequential decision-making in scenarios like online model selection where options learn over time, though it is incremental as it extends existing BAI methods to a rising bandit setting.

The paper tackles the fixed-budget Best Arm Identification problem for Stochastic Rising Bandits, where options improve with selection, by proposing R-UCBE and R-SR algorithms that guarantee optimal identification with a sufficiently large budget and match a derived lower bound on error probability.

Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected. This setting captures a wide range of scenarios in which the available options are learning entities whose performance improves (in expectation) over time (e.g., online best model selection). While previous works addressed the regret minimization problem, this paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs. In this scenario, given a fixed budget of rounds, we are asked to provide a recommendation about the best option at the end of the identification process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process and on the simple regret. Furthermore, we derive a lower bound on the error probability, matched by our R-SR (up to constants), and illustrate how the need for a sufficiently large budget is unavoidable in the SRB setting. Finally, we numerically validate the proposed algorithms in both synthetic and realistic environments.

View on arXiv PDF Code

Similar