Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions
This work addresses theoretical improvements in clustering methods for data analysis, but appears incremental as it builds on existing OWA-based linkages.
The paper tackles the problem of generalizing hierarchical clustering linkages using Ordered Weighted Averaging (OWA) operators, resulting in conditions to prevent dendrogram inversions and extending the Lance-Williams formula to infinite weight sequences.
Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.