DSDCLGJun 15, 2021

Correlation Clustering in Constant Many Parallel Rounds

arXiv:2106.08448v152 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of scaling clustering algorithms for large graphs in distributed settings, though it appears incremental as it builds on existing MPC frameworks.

The paper tackles the problem of correlation clustering in the massively parallel computation (MPC) model by proposing an algorithm that achieves a constant approximation in a constant number of rounds with sublinear memory per machine, which is faster than prior work.

Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes