Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances
This work addresses a specific challenge in multi-armed bandit optimization for scenarios with varying reward uncertainties, offering incremental improvements in algorithm design for this domain.
The paper tackles the problem of best-arm identification with fixed budget and heterogeneous reward variances by proposing two variance-adaptive algorithms, SHVar for known variances and SHAdaVar for unknown variances, which outperform prior methods with analytical guarantees in experiments.
We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.