Resampled Mutual Information for Clustering and Community Detection
This work addresses the need for improved clustering evaluation metrics, particularly for community detection in networks, though it appears incremental as it builds on existing approaches.
The paper tackles the problem of measuring clustering similarity by introducing resampled mutual information (ResMI), a novel measure that combines information theory and pair counting, and demonstrates its robustness to biases in synthetic datasets and effectiveness in real contact tracing networks.
We introduce resampled mutual information (ResMI), a novel measure of clustering similarity that combines insights from information theoretic and pair counting approaches to clustering and community detection. Similar to chance-corrected measures, ResMI satisfies the constant baseline property, but it has the advantages of not requiring adjustment terms and being fully interpretable in the language of information theory. Experiments on synthetic datasets demonstrate that ResMI is robust to common biases exhibited by existing measures, particularly in settings with high cluster counts and asymmetric cluster distributions. Additionally, we show that ResMI identifies meaningful community structures in two real contact tracing networks.