A New Index for Clustering Evaluation Based on Density Estimation
This work addresses the need for better clustering evaluation metrics in data analysis, though it appears incremental as it builds on existing methods.
The authors tackled the problem of internal clustering evaluation by introducing a new index based on density estimation, which significantly outperformed six existing indices on 145 datasets.
A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.