LGMLJun 15, 2020

Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion

arXiv:2006.08530v31 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a fundamental challenge in clustering for data scientists, offering an incremental improvement over existing stability-based approaches.

The paper tackles the problem of selecting the number of clusters K in non-parametric clustering by proposing a new validation criterion based on stability trade-offs, which overcomes limitations of previous stability-based methods and is empirically demonstrated to be effective.

Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation criterion is a consequence of the ill-defined objective of clustering. In this perspective, clustering stability has emerged as a natural and model-agnostic principle: an algorithm should find stable structures in the data. If data sets are repeatedly sampled from the same underlying distribution, an algorithm should find similar partitions. However, stability alone is not well-suited to determine the number of clusters. For instance, it is unable to detect if the number of clusters is too small. We propose a new principle: a good clustering should be stable, and within each cluster, there should exist no stable partition. This principle leads to a novel clustering validation criterion based on between-cluster and within-cluster stability, overcoming limitations of previous stability-based methods. We empirically demonstrate the effectiveness of our criterion to select the number of clusters and compare it with existing methods. Code is available at https://github.com/FlorentF9/skstab.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes