CLApr 13, 2018

Automatic Language Identification System for Hindi and Magahi

arXiv:1804.05095v17 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific need for improving web crawler accuracy by identifying languages, but it is incremental as it applies an existing method to a new language pair.

The paper tackled the problem of distinguishing between Hindi and Magahi, two closely related Indo-Aryan languages, using a rule-based language identifier, achieving an accuracy of approximately 86.34%.

Language identification has become a prerequisite for all kinds of automated text processing systems. In this paper, we present a rule-based language identifier tool for two closely related Indo-Aryan languages: Hindi and Magahi. This system has currently achieved an accuracy of approx 86.34%. We hope to improve this in the future. Automatic identification of languages will be significant in the accuracy of output of Web Crawlers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes