LG MLMay 28, 2019

Correlation Clustering with Adaptive Similarity Queries

Marco Bressan, Nicolò Cesa-Bianchi, Andrea Paudice, Fabio Vitale

arXiv:1905.11902v37.720 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient clustering with limited data queries, offering theoretical and practical insights for applications in data analysis and machine learning, though it is incremental in extending active learning to correlation clustering.

The paper tackles the problem of correlation clustering by treating it as an active learning task, where similarity scores are learned through queries to minimize both clustering disagreements and query count. It presents algorithms that achieve near-optimal trade-offs with cluster recovery guarantees and provides information-theoretic bounds on query requirements for given error levels.

In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error.

View on arXiv PDF Code

Similar