LG MLOct 3, 2018

Determining Optimal Number of k-Clusters based on Predefined Level-of-Similarity

arXiv:1810.01878v20.8Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for automated clustering in streaming data scenarios, but it appears incremental as it builds on centroid-based methods with a specific similarity threshold.

The paper tackles the problem of clustering data without specifying the number of clusters in advance by proposing a centroid-based algorithm that uses a similarity measure to assign data-points to existing clusters or create new ones based on a predefined level-of-similarity, applicable to streaming data.

This paper proposes a centroid-based clustering algorithm which is capable of clustering data-points with n-features, without having to specify the number of clusters to be formed. The core logic behind the algorithm is a similarity measure, which collectively decides whether to assign an incoming data-point to a pre-existing cluster, or create a new cluster and assign the data-point to it. The proposed clustering algorithm is application-specific and is applicable when the need is to perform clustering analysis of a stream of data-points, where the similarity measure between an incoming data-point and the cluster to which the data-point is to be associated with, is greater than the predefined Level-of-Similarity.

View on arXiv PDF Code

Similar