LG DS NANov 21, 2016

Correlation Clustering with Low-Rank Matrices

Nate Veldt, Anthony Wirth, David F. Gleich

arXiv:1611.07305v24.913 citations

Originality Incremental advance

AI Analysis

This provides a theoretical and practical advance for clustering tasks where data has low-rank structure, though it is incremental as it builds on existing correlation clustering methods.

The paper tackles the NP-hard correlation clustering problem by showing it can be solved exactly in polynomial time for low-rank positive semidefinite matrices, but remains NP-hard with negative eigenvalues, and develops an efficient algorithm using zonotope vertex enumeration that demonstrates effectiveness on synthetic and real-world data.

Correlation clustering is a technique for aggregating data based on qualitative information about which pairs of objects are labeled 'similar' or 'dissimilar.' Because the optimization problem is NP-hard, much of the previous literature focuses on finding approximation algorithms. In this paper we explore how to solve the correlation clustering objective exactly when the data to be clustered can be represented by a low-rank matrix. We prove in particular that correlation clustering can be solved in polynomial time when the underlying matrix is positive semidefinite with small constant rank, but that the task remains NP-hard in the presence of even one negative eigenvalue. Based on our theoretical results, we develop an algorithm for efficiently "solving" low-rank positive semidefinite correlation clustering by employing a procedure for zonotope vertex enumeration. We demonstrate the effectiveness and speed of our algorithm by using it to solve several clustering problems on both synthetic and real-world data.

View on arXiv PDF

Similar