MLLGMar 29, 2018

On Hyperparameter Search in Cluster Ensembles

arXiv:1803.11008v17 citations
Originality Incremental advance
AI Analysis

This addresses a long-standing challenge in unsupervised learning for researchers and practitioners by providing a method to improve algorithm selection and hyperparameter tuning in clustering.

The paper tackles the problem of evaluating clustering algorithms and hyperparameters without robust validity scores by proposing to use cluster ensemble aggregation techniques like consensus clustering as a quality measure. They demonstrate that normalized mutual information between individual clusterings and the ensemble consensus can identify optimal configurations, even with distorted consensus.

Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often makes the algorithm selection and hyperparameter evaluation a tough guess. In this paper, we show that cluster ensemble aggregation techniques such as consensus clustering may be used to evaluate clusterings and their hyperparameter configurations. We use normalized mutual information to compare individual objects of a clustering ensemble to the constructed consensus of the whole ensemble and show, that the resulting score can serve as an overall quality measure for clustering problems. This method is capable of highlighting the standout clustering and hyperparameter configuration in the ensemble even in the case of a distorted consensus. We apply this very general framework to various data sets and give possible directions for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes