LG MLMay 27, 2019

Thresholding Bandit with Optimal Aggregate Regret

Chao Tao, Saùl Blanco, Jian Peng, Yuan Zhou

arXiv:1905.11046v16.013 citations

Originality Incremental advance

AI Analysis

This provides an efficient solution for multi-armed bandit selection problems in scenarios requiring threshold-based classification, though it appears incremental relative to prior bandit algorithms.

The authors tackled the thresholding bandit problem of identifying arms with mean rewards above a threshold θ within a fixed budget T, introducing the LSA algorithm that minimizes aggregate regret. They proved it is instance-wise asymptotically optimal and demonstrated superior empirical performance over existing methods.

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $θ$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

View on arXiv PDF

Similar