MLLGOct 19, 2023

DCSI -- An improved measure of cluster separability based on separation and connectedness

arXiv:2310.12806v48 citationsh-index: 7
Originality Incremental advance
AI Analysis

This provides a tool for researchers and practitioners to assess clustering algorithm suitability on real-world datasets, though it is incremental as it builds on existing separability concepts.

The paper tackled the problem of evaluating whether class labels correspond to meaningful clusters for density-based clustering by developing DCSI, a measure based on separation and connectedness, which strongly correlates with DBSCAN performance on synthetic data but lacks robustness for overlapping multi-class data.

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes