LG MLAug 9, 2024

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

arXiv:2408.04869v32 citationsh-index: 5

AI Analysis

This work addresses the problem of efficiently identifying the best arm under a fixed budget for researchers and practitioners in bandit algorithms, representing an incremental improvement by adapting UCB methods with Bayesian priors.

The paper tackled the fixed-budget best-arm identification problem in a Bayesian setting by proposing a UCB exploration algorithm that learns prior information, achieving failure probability and simple regret bounds of order Õ(√K/n) and outperforming state-of-the-art baselines empirically.

We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI. However, it is well-known that its optimal regret is theoretically dependent on instances, which we show to be an artifact in many fixed-budget BAI problems. In this paper we propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting. The key idea is to learn prior information, which can enhance the performance of UCB-based BAI algorithm as it has done in the cumulative regret minimization problem. We establish bounds on the failure probability and the simple regret for the Bayesian BAI problem, providing upper bounds of order $\tilde{O}(\sqrt{K/n})$, up to logarithmic factors, where $n$ represents the budget and $K$ denotes the number of arms. Furthermore, we demonstrate through empirical results that our approach consistently outperforms state-of-the-art baselines.

View on arXiv PDF

Similar