Anytime Hierarchical Clustering
This provides an incremental improvement for researchers and practitioners needing scalable, distributed hierarchical clustering for dynamic data.
The authors tackled the problem of hierarchical clustering by proposing an anytime method that iteratively refines an initial hierarchy to produce nested partitions satisfying homogeneity requirements, with evidence suggesting it enables decentralized, scalable algorithms for large, dynamic datasets.
We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.