Closing the Gap on the Sample Complexity of 1-Identification
It addresses a fundamental open problem in pure exploration for multi-armed bandits, providing near-optimal sample complexity results.
The paper tackles the sample complexity of 1-identification in multi-armed bandits, deriving a new lower bound and designing an algorithm with tight upper bounds that close the gap to logarithmic factors.
1-identification is a fundamental multi-armed bandit formulation on pure exploration. An agent aims to determine whether there exists a qualified arm whose mean reward is not less than a known threshold $μ_0$, or to output \textsf{None} if it believes such an arm does not exist. The agent needs to guarantee its output is correct with probability at least $1-δ$, while making expected total pulling times $\mathbb{E}τ$ as small as possible. We work on 1-identification with two main contributions. (1) We utilize an optimization formulation to derive a new lower bound of $\mathbb{E}τ$, when there is at least one qualified arm. (2) We design a new algorithm, deriving tight upper bounds whose gap to lower bounds are up to a polynomial of logarithm factor across all problem instance. Our result complements the analysis of $\mathbb{E}τ$ when there are multiple qualified arms, which is an open problem left by history literature.