SI LGNov 21, 2024

Resampled Mutual Information for Clustering and Community Detection

arXiv:2412.03584v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the need for improved clustering evaluation metrics, particularly for community detection in networks, though it appears incremental as it builds on existing approaches.

The paper tackles the problem of measuring clustering similarity by introducing resampled mutual information (ResMI), a novel measure that combines information theory and pair counting, and demonstrates its robustness to biases in synthetic datasets and effectiveness in real contact tracing networks.

We introduce resampled mutual information (ResMI), a novel measure of clustering similarity that combines insights from information theoretic and pair counting approaches to clustering and community detection. Similar to chance-corrected measures, ResMI satisfies the constant baseline property, but it has the advantages of not requiring adjustment terms and being fully interpretable in the language of information theory. Experiments on synthetic datasets demonstrate that ResMI is robust to common biases exhibited by existing measures, particularly in settings with high cluster counts and asymmetric cluster distributions. Additionally, we show that ResMI identifies meaningful community structures in two real contact tracing networks.

View on arXiv PDF

Similar