LG MLJan 24, 2019

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

arXiv:1901.08386v114.028 citations

Originality Incremental advance

AI Analysis

This addresses the need for selecting multiple good solutions in applications like crowd-sourcing and drug-designing, offering a more effective alternative to single-arm or best-subset selection, though it builds incrementally on existing PAC frameworks.

The paper tackles the problem of identifying k out of the best m arms in stochastic multi-armed bandits, presenting a lower bound on sample complexity and a PAC algorithm (GLUCB) that improves efficiency on easy instances, with extensions to infinite-armed bandits showing additive poly-log improvements over prior work.

We consider the problem of identifying any $k$ out of the best $m$ arms in an $n$-armed stochastic multi-armed bandit. Framed in the PAC setting, this particular problem generalises both the problem of `best subset selection' and that of selecting `one out of the best m' arms [arcsk 2017]. In applications such as crowd-sourcing and drug-designing, identifying a single good solution is often not sufficient. Moreover, finding the best subset might be hard due to the presence of many indistinguishably close solutions. Our generalisation of identifying exactly $k$ arms out of the best $m$, where $1 \leq k \leq m$, serves as a more effective alternative. We present a lower bound on the worst-case sample complexity for general $k$, and a fully sequential PAC algorithm, \GLUCB, which is more sample-efficient on easy instances. Also, extending our analysis to infinite-armed bandits, we present a PAC algorithm that is independent of $n$, which identifies an arm from the best $ρ$ fraction of arms using at most an additive poly-log number of samples than compared to the lower bound, thereby improving over [arcsk 2017] and [Aziz+AKA:2018]. The problem of identifying $k > 1$ distinct arms from the best $ρ$ fraction is not always well-defined; for a special class of this problem, we present lower and upper bounds. Finally, through a reduction, we establish a relation between upper bounds for the `one out of the best $ρ$' problem for infinite instances and the `one out of the best $m$' problem for finite instances. We conjecture that it is more efficient to solve `small' finite instances using the latter formulation, rather than going through the former.

View on arXiv PDF

Similar