Analysing Multiscale Clusterings with Persistent Homology
This work addresses a foundational challenge in data clustering for researchers and practitioners, offering a novel topological framework, though it is incremental in applying existing tools to a specific problem.
The paper tackles the problem of analyzing and comparing sequences of partitions in multiscale clustering by introducing the Multiscale Clustering Filtration (MCF), which uses persistent homology to measure hierarchy and track conflicts across scales, demonstrating its utility for characterizing and classifying clusterings with synthetic data.
In data clustering, it is often desirable to find not just a single partition into clusters but a sequence of partitions that describes the data at different scales (or levels of coarseness). A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions. Here, we use tools from topological data analysis and introduce the Multiscale Clustering Filtration (MCF), a well-defined and stable filtration of abstract simplicial complexes that encodes arbitrary cluster assignments in a sequence of partitions across scales of increasing coarseness. We show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy of this sequence, and the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we provide an equivalent construction via a nerve complex filtration, and we show that, in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. Using synthetic data, we then illustrate how the persistence diagram of the MCF provides a feature map that can serve to characterise and classify multiscale clusterings.