DISCO: Internal Evaluation of Density-Based Clustering
This addresses a gap in internal evaluation for density-based clustering, which is incremental as it extends existing methods to include noise assessment.
The paper tackles the problem of evaluating density-based clustering results, particularly the quality of noise labels, by proposing DISCO, the first cluster validity index that assesses both cluster compactness and separation as well as noise labels, showing more consistent evaluation than competitors.
In density-based clustering, clusters are areas of high object density separated by lower object density areas. This notion supports arbitrarily shaped clusters and automatic detection of noise points that do not belong to any cluster. However, it is challenging to adequately evaluate the quality of density-based clustering results. Even though some existing cluster validity indices (CVIs) target arbitrarily shaped clusters, none of them captures the quality of the labeled noise. In this paper, we propose DISCO, a Density-based Internal Score for Clustering Outcomes, which is the first CVI that also evaluates the quality of noise labels. DISCO reliably evaluates density-based clusters of arbitrary shape by assessing compactness and separation. It also introduces a direct assessment of noise labels for any given clustering. Our experiments show that DISCO evaluates density-based clusterings more consistently than its competitors. It is additionally the first method to evaluate the complete labeling of density-based clustering methods, including noise labels.