Homogeneity of Cluster Ensembles
This work tackles a foundational issue in consensus clustering for researchers and practitioners, though it is incremental as it builds on existing statistical theory of partitions.
The paper addresses the non-uniqueness of expectation and mean in cluster ensembles, which complicates statistical inference and cluster stability, by introducing homogeneity as a measure for the likelihood of a unique mean and showing it relates to cluster stability, with empirical results indicating uniqueness is not exceptional in real-world data.
The expectation and the mean of partitions generated by a cluster ensemble are not unique in general. This issue poses challenges in statistical inference and cluster stability. In this contribution, we state sufficient conditions for uniqueness of expectation and mean. The proposed conditions show that a unique mean is neither exceptional nor generic. To cope with this issue, we introduce homogeneity as a measure of how likely is a unique mean for a sample of partitions. We show that homogeneity is related to cluster stability. This result points to a possible conflict between cluster stability and diversity in consensus clustering. To assess homogeneity in a practical setting, we propose an efficient way to compute a lower bound of homogeneity. Empirical results using the k-means algorithm suggest that uniqueness of the mean partition is not exceptional for real-world data. Moreover, for samples of high homogeneity, uniqueness can be enforced by increasing the number of data points or by removing outlier partitions. In a broader context, this contribution can be placed as a further step towards a statistical theory of partitions.