CLJul 20, 2013

Clustering Algorithm for Gujarati Language

arXiv:1307.5393v13 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem for Gujarati language processing, but it is incremental as it adapts clustering techniques to a new linguistic context.

The authors tackled the problem of clustering Gujarati words for stemming by proposing a new algorithm and applied it to a dataset of 50,000 tagged words, achieving results that enable root extraction as a preprocessing step.

Natural language processing area is still under research. But now a day it is on platform for worldwide researchers. Natural language processing includes analyzing the language based on its structure and then tagging of each word appropriately with its grammar base. Here we have 50,000 tagged words set and we try to cluster those Gujarati words based on proposed algorithm, we have defined our own algorithm for processing. Many clustering techniques are available Ex. Single linkage, complete, linkage,average linkage, Hear no of clusters to be formed are not known, so it is all depends on the type of data set provided . Clustering is preprocess for stemming . Stemming is the process where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and plural form.

View on arXiv PDF

Similar