LG MLFeb 16, 2013

Clustering validity based on the most similarity

Raheleh Namayandeh, Farzad Didehvar, Zahra Shojaei

arXiv:1302.3956v11 citations

Originality Incremental advance

AI Analysis

This work addresses a challenge in clustering for large systems where data is not fully available initially, offering an incremental solution.

The paper tackles the problem of evaluating clustering results when data arrives incrementally, proposing a new validity measure that maximizes repetitions across initial values to select optimal clustering independent of input parameters.

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic approaches. Since the most of clustering methods depend on their input parameters, it is important to evaluate the result of a clustering algorithm with its different input parameters, to choose the most appropriate one. There are several clustering validity techniques based on inner density and outer density of clusters that represent different metrics to choose the most appropriate clustering independent of the input parameters. According to dependency of previous methods on the input parameters, one challenge in facing with large systems, is to complete data incrementally that effects on the final choice of the most appropriate clustering. Those methods define the existence of high intensity in a cluster, and low intensity among different clusters as the measure of choosing the optimal clustering. This measure has a tremendous problem, not availing all data at the first stage. In this paper, we introduce an efficient measure in which maximum number of repetitions for various initial values occurs.

View on arXiv PDF

Similar