IRMay 1, 2015

Comparison Clustering using Cosine and Fuzzy set based Similarity Measures of Text Documents

Manan Mohan Goyal, Neha Agrawal, Manoj Kumar Sarma, Nayan Jyoti Kalita

arXiv:1505.00168v13.26 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for text clustering applications, focusing on comparing similarity measures.

The paper tackled the problem of clustering text documents by comparing K-means clustering using fuzzy and cosine similarity measures instead of Euclidean distance, and found that accuracy varied between these measures with timing parameters used to decide the optimum one.

Keeping in consideration the high demand for clustering, this paper focuses on understanding and implementing K-means clustering using two different similarity measures. We have tried to cluster the documents using two different measures rather than clustering it with Euclidean distance. Also a comparison is drawn based on accuracy of clustering between fuzzy and cosine similarity measure. The start time and end time parameters for formation of clusters are used in deciding optimum similarity measure.

View on arXiv PDF

Similar