Document Clustering using K-Means and K-Medoids
This addresses the problem of information overload for general users by providing a method to cluster and summarize documents, though it appears incremental as it compares existing algorithms.
The paper compared K-means and K-medoids clustering algorithms for document clustering to identify the best approach, then performed document summarization on the resulting clusters using sentence weighting to help users quickly find relevant information.
With the huge upsurge of information in day-to-days life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to gather the relevant information in a cluster. There are several algorithms for clustering information out of which in this paper, we accomplish K-means and K-Medoids clustering algorithm and a comparison is carried out to find which algorithm is best for clustering. On the best clusters formed, document summarization is executed based on sentence weight to focus on key point of the whole document, which makes it easier for people to ascertain the information they want and thus read only those documents which is relevant in their point of view.