Experimental Estimation of Number of Clusters Based on Cluster Quality
This addresses a drawback in clustering algorithms for text mining, but it is incremental as it focuses on experimental estimation rather than a new method.
The paper tackles the problem of determining the number of clusters in text clustering, which is typically required as input, by experimentally estimating it based on cluster quality, using partitional clustering algorithms for large document datasets.
Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.