Condorcet's Jury Theorem for Consensus Clustering and its Implications for Diversity
This work addresses theoretical gaps in consensus clustering for machine learning practitioners, offering a new perspective on diversity's role, though it is incremental as it builds on existing ensemble methods.
The authors extended Condorcet's Jury Theorem to consensus clustering, showing that combining many partitions can improve performance under specific assumptions, and challenged the idea that diversity of sample partitions is key, suggesting instead that limiting diversity of mean partitions is necessary for quality control.
Condorcet's Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet's Jury Theorem to the mean partition approach under the additional assumptions that a unique ground-truth partition exists and sample partitions are drawn from a sufficiently small ball containing the ground-truth. As an implication of practical relevance, we question the claim that the quality of consensus clustering depends on the diversity of the sample partitions. Instead, we conjecture that limiting the diversity of the mean partitions is necessary for controlling the quality.