A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Ryan DeWolfe, Paweł Prałat, François Théberge

arXiv:2602.14855v12.71 citationsh-index: 13

Originality Synthesis-oriented

AI Analysis

This addresses a gap in extrinsic evaluation for clustering algorithms in data science, though it appears incremental as it builds on existing comparison measures.

The paper tackles the problem of comparing clusterings that include overlaps and outliers, which lacks existing methods, by defining a similarity measure with desirable properties and showing it avoids common biases.

Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general setting, the detected and ground truth clusterings may have outliers (objects belonging to no cluster), overlapping clusters (objects may belong to more than one cluster), or both, but methods for comparing these clusterings are currently undeveloped. In this note, we define a pragmatic similarity measure for comparing clusterings with overlaps and outliers, show that it has several desirable properties, and experimentally confirm that it is not subject to several common biases afflicting other clustering comparison measures.

View on arXiv PDF

Similar