LG PR ST MLMay 20, 2016

Fast Randomized Semi-Supervised Clustering

Alaa Saade, Florent Krzakala, Marc Lelarge, Lenka Zdeborová

arXiv:1605.06422v32.72 citations

Originality Incremental advance

AI Analysis

This addresses the problem of semi-supervised clustering with limited labeled data for researchers in machine learning and data analysis, though it appears incremental as it builds on existing non-backtracking operator methods.

The paper tackles the problem of clustering partially labeled data using minimal random pairwise comparisons, introducing an efficient local algorithm based on power iteration of the non-backtracking operator. For two clusters, they show that a small classification error can be achieved with O(n) random measurements, where n is the dataset size, and demonstrate efficiency in time and space complexities.

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a simple model. For the case of two clusters, we give bounds on the classification error and show that a small error can be achieved from $O(n)$ randomly chosen measurements, where $n$ is the number of items in the dataset. Our algorithm is therefore efficient both in terms of time and space complexities. We also investigate numerically the performance of the algorithm on synthetic and real world data.

View on arXiv PDF

Similar