MLLGNov 3, 2021

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

arXiv:2111.02302v3
Originality Incremental advance
AI Analysis

This addresses the challenge of choosing among multiple clustering outputs for researchers and practitioners, though it is incremental as it builds on existing validation methods.

The paper tackles the problem of selecting clustering solutions by proposing a unifying approach based on quadratic discriminant scores, which allows comparison of partitions from different methods and achieves better overall performance in experiments.

Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. Moreover, they are often restricted to operate on partitions obtained from a specific method. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant score function and parameters describing clusters' size, center and scatter. We develop two cluster-quality criteria called quadratic scores. We show that these criteria are consistent with groups generated from a general class of elliptically-symmetric distributions. The quest for this type of groups is common in applications. The connection with likelihood theory for mixture models and model-based clustering is investigated. Based on bootstrap resampling of the quadratic scores, we propose a selection rule that allows choosing among many clustering solutions. The proposed method has the distinctive advantage that it can compare partitions that cannot be compared with other state-of-the-art methods. Extensive numerical experiments and the analysis of real data show that, even if some competing methods turn out to be superior in some setups, the proposed methodology achieves a better overall performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes