LG ITFeb 5

Almost Asymptotically Optimal Active Clustering Through Pairwise Observations

Rachel S. Y. Teo, P. N. Karthik, Ramya Korlakai Vinayak, Vincent Y. F. Tan

arXiv:2602.05690v11.4h-index: 9

Originality Highly original

AI Analysis

This addresses the challenge of efficient active clustering with noisy feedback, which is incremental as it builds on existing change-of-measure techniques to achieve near-optimal query complexity.

The paper tackles the problem of clustering items into an unknown number of groups using noisy pairwise queries, establishing a fundamental lower bound on the expected queries needed for a desired accuracy and designing an asymptotically optimal algorithm with performance within a constant multiple of this bound.

We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses. At each time step, an agent is allowed to query pairs of items and observe bandit binary feedback. If the pair of items belongs to the same (resp.\ different) cluster, the observed feedback is $1$ with probability $p>1/2$ (resp.\ $q<1/2$). Leveraging the ubiquitous change-of-measure technique, we establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the clustering accuracy, formulated as a sup-inf optimization problem. Building on this theoretical foundation, we design an asymptotically optimal algorithm in which the stopping criterion involves an empirical version of the inner infimum -- the Generalized Likelihood Ratio (GLR) statistic -- being compared to a threshold. We develop a computationally feasible variant of the GLR statistic and show that its performance gap to the lower bound can be accurately empirically estimated and remains within a constant multiple of the lower bound.

View on arXiv PDF

Similar