IRMar 10, 2015

Experimental Estimation of Number of Clusters Based on Cluster Quality

arXiv:1503.03168v15 citations

Originality Synthesis-oriented

AI Analysis

This addresses a drawback in clustering algorithms for text mining, but it is incremental as it focuses on experimental estimation rather than a new method.

The paper tackles the problem of determining the number of clusters in text clustering, which is typically required as input, by experimentally estimating it based on cluster quality, using partitional clustering algorithms for large document datasets.

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.

View on arXiv PDF

Similar