LG AI IT MLJan 25, 2022

Almost Optimal Variance-Constrained Best Arm Identification

Yunlong Hou, Vincent Y. F. Tan, Zixin Zhong

arXiv:2201.10142v211.118 citationsHas Code

Originality Highly original

AI Analysis

This work addresses risk-constrained decision-making in sequential experiments, such as clinical trials or finance, by providing an efficient algorithm for variance-aware best arm identification, though it is incremental as it builds on existing bandit methods.

The paper tackles the problem of identifying the best arm in a multi-armed bandit setting under a fixed-confidence setup with a strict variance constraint, proposing the VA-LUCB algorithm and showing it achieves near-optimal sample complexity up to logarithmic factors, with experiments indicating it outperforms a competitor in riskier instances.

We design and analyze VA-LUCB, a parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and under a stringent constraint that the variance of the chosen arm is strictly smaller than a given threshold. An upper bound on VA-LUCB's sample complexity is shown to be characterized by a fundamental variance-aware hardness quantity $H_{VA}$. By proving a lower bound, we show that sample complexity of VA-LUCB is optimal up to a factor logarithmic in $H_{VA}$. Extensive experiments corroborate the dependence of the sample complexity on the various terms in $H_{VA}$. By comparing VA-LUCB's empirical performance to a close competitor RiskAverse-UCB-BAI by David et al. (2018), our experiments suggest that VA-LUCB has the lowest sample complexity for this class of risk-constrained best arm identification problems, especially for the riskiest instances.

View on arXiv PDF Code

Similar