SI LG SOC-PH MLOct 22, 2019

Clustering in graphs and hypergraphs with categorical edge labels

Ilya Amburg, Nate Veldt, Austin R. Benson

arXiv:1910.09943v2119 citations

Originality Incremental advance

AI Analysis

This addresses the need for rigorous methods to analyze complex network data with higher-order interactions and multiple edge types, which is incremental as it builds on correlation clustering but extends it to hypergraphs and categorical labels.

The paper tackles the problem of clustering nodes in hypergraphs with categorical edge labels, where clusters are groups of nodes that frequently participate in the same type of interaction, and it results in efficient polynomial-time algorithms for two label types and fast approximation algorithms with theoretical guarantees for more than two types.

Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called "higher-order interactions" that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels --- or different interaction types --- where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis.

View on arXiv PDF

Similar