CLApr 13, 2018

Automatic Language Identification System for Hindi and Magahi

Priya Rani, Atul Kr. Ojha, Girish Nath Jha

arXiv:1804.05095v10.57 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific need for improving web crawler accuracy by identifying languages, but it is incremental as it applies an existing method to a new language pair.

The paper tackled the problem of distinguishing between Hindi and Magahi, two closely related Indo-Aryan languages, using a rule-based language identifier, achieving an accuracy of approximately 86.34%.

Language identification has become a prerequisite for all kinds of automated text processing systems. In this paper, we present a rule-based language identifier tool for two closely related Indo-Aryan languages: Hindi and Magahi. This system has currently achieved an accuracy of approx 86.34%. We hope to improve this in the future. Automatic identification of languages will be significant in the accuracy of output of Web Crawlers.

View on arXiv PDF

Similar