Active clustering with bandit feedback
This work addresses clustering in active learning with bandit feedback, offering theoretical guarantees and an efficient algorithm for scenarios where arms are grouped by mean vectors, which is incremental as it builds on existing clustering and bandit frameworks.
The paper tackles the Active Clustering Problem by deriving a non-asymptotic lower bound for the budget needed to uncover a hidden partition of arms in a stochastic bandit setting and introduces the ACB algorithm that matches this bound in most regimes, improving over uniform sampling.
We investigate the Active Clustering Problem (ACP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e., the least number of observation - and with a probability of error smaller than a prescribed constant $δ$. In this paper, (i) we derive a non-asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the active setting.