LGMEMLNov 22, 2025

Hierarchical Linkage Clustering Beyond Binary Trees and Ultrametrics

arXiv:2511.18056v1
Originality Incremental advance
AI Analysis

This work addresses fundamental issues in hierarchical clustering for data analysis, providing a more flexible and robust framework, though it is incremental in improving existing methods.

The paper tackles the limitations of traditional hierarchical clustering methods, which always return a hierarchy even if none exists, are restricted to binary trees, and are sensitive to linkage functions, by introducing the concept of a valid hierarchy and proving the existence of a finest valid hierarchy that is not constrained to binary structures and collapses to a star tree when no hierarchy exists.

Hierarchical clustering seeks to uncover nested structures in data by constructing a tree of clusters, where deeper levels reveal finer-grained relationships. Traditional methods, including linkage approaches, face three major limitations: (i) they always return a hierarchy, even if none exists, (ii) they are restricted to binary trees, even if the true hierarchy is non-binary, and (iii) they are highly sensitive to the choice of linkage function. In this paper, we address these issues by introducing the notion of a valid hierarchy and defining a partial order over the set of valid hierarchies. We prove the existence of a finest valid hierarchy, that is, the hierarchy that encodes the maximum information consistent with the similarity structure of the data set. In particular, the finest valid hierarchy is not constrained to binary structures and, when no hierarchical relationships exist, collapses to a star tree. We propose a simple two-step algorithm that first constructs a binary tree via a linkage method and then prunes it to enforce validity. We establish necessary and sufficient conditions on the linkage function under which this procedure exactly recovers the finest valid hierarchy, and we show that all linkage functions satisfying these conditions yield the same hierarchy after pruning. Notably, classical linkage rules such as single, complete, and average satisfy these conditions, whereas Ward's linkage fails to do so.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes