LG AI IT ST MLMay 10, 2023

Best Arm Identification in Bandits with Limited Precision Sampling

Kota Srinivas Reddy, P. N. Karthik, Nikhil Karamchandani, Jayakrishnan Nair

arXiv:2305.06082v13.82 citations

Originality Incremental advance

AI Analysis

This work addresses a practical limitation in bandit algorithms for scenarios like resource-constrained systems, offering incremental improvements over existing methods.

The paper tackles the problem of best arm identification in multi-armed bandits where sampling is limited to predefined exploration bundles, aiming to minimize expected stopping time under an error probability constraint. It presents asymptotic lower bounds, proposes a modified tracking algorithm that achieves asymptotic optimality, and provides non-asymptotic bounds for non-overlapping cases.

We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled arm and its instantaneous reward are revealed to the learner, whose goal is to find the best arm by minimising the expected stopping time, subject to an upper bound on the error probability. We present an asymptotic lower bound on the expected stopping time, which holds as the error probability vanishes. We show that the optimal allocation suggested by the lower bound is, in general, non-unique and therefore challenging to track. We propose a modified tracking-based algorithm to handle non-unique optimal allocations, and demonstrate that it is asymptotically optimal. We also present non-asymptotic lower and upper bounds on the stopping time in the simpler setting when the arms accessible from one box do not overlap with those of others.

View on arXiv PDF

Similar