MLLGDec 27, 2016

Clustering with Confidence: Finding Clusters with Statistical Guarantees

arXiv:1612.08714v24 citations
Originality Incremental advance
AI Analysis

This addresses the need for statistical guarantees in clustering for researchers and practitioners dealing with unstable clustering results, though it is incremental as it builds on existing clustering and classification algorithms.

The authors tackled the problem of clustering instability by proposing a method to quantify robustness and identify core clusters with a statistical guarantee that each data item's co-occurrence probability within a cluster is at least 1 - α. They demonstrated the method's effectiveness on simulated and real datasets, showing that the clusters meet the robustness guarantees.

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least $1 - α$. We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes