Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version)
This addresses a long-standing challenge in unsupervised learning by making clustering more applicable in real-world scenarios where hyperparameters are unknown, though it appears incremental as it builds on hierarchical and self-supervised techniques.
The paper tackles the problem of hyperparameter dependence in clustering by proposing a parameter-free framework called SCMax, which integrates hierarchical agglomerative clustering with self-supervised learning to determine the optimal number of clusters, and it outperforms existing methods on multiple datasets.
Clustering is a fundamental task in unsupervised learning, but most existing methods heavily rely on hyperparameters such as the number of clusters or other sensitive settings, limiting their applicability in real-world scenarios. To address this long-standing challenge, we propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax. Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process. At each step of agglomeration, it creates a new, structure-aware data representation through a self-supervised learning task guided by the current clustering structure. We then introduce a nearest neighbor consensus score, which measures the agreement between the nearest neighbor-based merge decisions suggested by the original representation and the self-supervised one. The moment at which consensus maximization occurs can serve as a criterion for determining the optimal number of clusters. Extensive experiments on multiple datasets demonstrate that the proposed framework outperforms existing clustering approaches designed for scenarios with an unknown number of clusters.