MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
This work addresses the challenge of understanding complex, non-hierarchical multiscale structures in data for researchers in topological data analysis and clustering, though it is incremental as it builds on existing persistent homology methods.
The paper tackles the problem of analyzing and comparing multiscale clusterings in datasets by introducing the Multiscale Clustering Bifiltration (MCbiF), a 2-parameter filtration that encodes cluster intersections across scales, and shows that its multiparameter persistent homology yields topological feature maps that outperform information-based baselines on synthetic and real-world tasks, such as measuring non-hierarchies in mice social groupings.
Datasets often possess an intrinsic multiscale structure with meaningful descriptions at different levels of coarseness. Such datasets are naturally described as multi-resolution clusterings, i.e., not necessarily hierarchical sequences of partitions across scales. To analyse and compare such sequences, we use tools from topological data analysis and define the Multiscale Clustering Bifiltration (MCbiF), a 2-parameter filtration of abstract simplicial complexes that encodes cluster intersection patterns across scales. The MCbiF can be interpreted as a higher-order extension of Sankey diagrams and reduces to a dendrogram for hierarchical sequences. We show that the multiparameter persistent homology (MPH) of the MCbiF yields a finitely presented and block decomposable module, and its stable Hilbert functions characterise the topological autocorrelation of the sequence of partitions. In particular, at dimension zero, the MPH captures violations of the refinement order of partitions, whereas at dimension one, the MPH captures higher-order inconsistencies between clusters across scales. We demonstrate through experiments the use of MCbiF Hilbert functions as topological feature maps for downstream machine learning tasks. MCbiF feature maps outperform information-based baseline features on both regression and classification tasks on synthetic sets of non-hierarchical sequences of partitions. We also show an application of MCbiF to real-world data to measure non-hierarchies in wild mice social grouping patterns across time.