ML LGApr 16

Best of both worlds: Stochastic & adversarial best-arm identification

Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek, Michal Valko

arXiv:2604.1486081.454 citationsh-index: 84

AI Analysis

This work addresses the fundamental challenge of designing robust bandit algorithms for best-arm identification, providing impossibility results and near-optimal solutions for practitioners who face unknown reward types.

The paper studies best-arm identification in bandits with both stochastic and adversarial rewards, showing that a learner cannot be optimal in both settings simultaneously. It provides a lower bound characterizing the optimal rate for stochastic problems under adversarial robustness and proposes a parameter-free algorithm that matches this bound up to log factors while remaining robust to adversarial rewards.

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

View on arXiv PDF

Similar