Absolute indices for determining compactness, separability and number of clusters
This work addresses the challenge of cluster validation for data analysts by providing absolute indices, though it appears incremental as it builds on existing relative indices.
The paper tackles the problem of identifying true clusters in data by introducing novel absolute cluster indices for compactness and separability, which are used to determine the optimal number of clusters, and demonstrates their performance on synthetic and real-world datasets.
Finding "true" clusters in a data set is a challenging problem. Clustering solutions obtained using different models and algorithms do not necessarily provide compact and well-separated clusters or the optimal number of clusters. Cluster validity indices are commonly applied to identify such clusters. Nevertheless, these indices are typically relative, and they are used to compare clustering algorithms or choose the parameters of a clustering algorithm. Moreover, the success of these indices depends on the underlying data structure. This paper introduces novel absolute cluster indices to determine both the compactness and separability of clusters. We define a compactness function for each cluster and a set of neighboring points for cluster pairs. This function is utilized to determine the compactness of each cluster and the whole cluster distribution. The set of neighboring points is used to define the margin between clusters and the overall distribution margin. The proposed compactness and separability indices are applied to identify the true number of clusters. Using a number of synthetic and real-world data sets, we demonstrate the performance of these new indices and compare them with other widely-used cluster validity indices.