On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering
This work addresses a gap in hybrid clustering for researchers by providing theoretical properties to guide measure selection, though it is incremental as it builds on existing dissimilarity concepts.
The paper tackles the lack of data-independent properties for evaluating density-based dissimilarity measures in hybrid clustering, proposing six properties and introducing a new measure based on Kullback-Leibler information that satisfies all of them, as validated on real and simulated datasets.
Hybrid clustering combines partitional and hierarchical clustering for computational effectiveness and versatility in cluster shape. In such clustering, a dissimilarity measure plays a crucial role in the hierarchical merging. The dissimilarity measure has great impact on the final clustering, and data-independent properties are needed to choose the right dissimilarity measure for the problem at hand. Properties for distance-based dissimilarity measures have been studied for decades, but properties for density-based dissimilarity measures have so far received little attention. Here, we propose six data-independent properties to evaluate density-based dissimilarity measures associated with hybrid clustering, regarding equality, orthogonality, symmetry, outlier and noise observations, and light-tailed models for heavy-tailed clusters. The significance of the properties is investigated, and we study some well-known dissimilarity measures based on Shannon entropy, misclassification rate, Bhattacharyya distance and Kullback-Leibler divergence with respect to the proposed properties. As none of them satisfy all the proposed properties, we introduce a new dissimilarity measure based on the Kullback-Leibler information and show that it satisfies all proposed properties. The effect of the proposed properties is also illustrated on several real and simulated data sets.