LGMLOct 8, 2020

Near-Optimal Comparison Based Clustering

arXiv:2010.03918v212 citations
Originality Highly original
AI Analysis

This addresses the challenge of clustering without explicit similarity measures for applications where only comparative data is available, representing a novel method for a known bottleneck.

The paper tackles the problem of clustering objects when only ordinal comparisons are available, rather than explicit similarity measures, by proposing a two-step method that estimates a similarity matrix and uses semi-definite programming for clustering. The result shows that this approach can exactly recover a planted clustering with a near-optimal number of comparisons, as validated theoretically and empirically on real data.

The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily available and, instead, one only observes ordinal comparisons such as "object i is more similar to j than to k." In this paper, we tackle this problem using a two-step procedure: we estimate a pairwise similarity matrix from the comparisons before using a clustering method based on semi-definite programming (SDP). We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes