Thresholding Bandit Problem with Both Duels and Pulls
This work addresses a crowdsourcing problem where dueling arms can be more cost-effective than direct pulls, though it is incremental as it extends an existing bandit framework.
The paper tackles the Thresholding Bandit Problem by introducing a setting that allows both pulling arms and dueling them, aiming to identify arms with mean rewards above a threshold more efficiently. It presents the Rank-Search algorithm, which outperforms baseline methods in experiments.
The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also \emph{duel} two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost-effective and time-efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels.