Automatic Language Identification System for Hindi and Magahi
This work addresses a domain-specific need for improving web crawler accuracy by identifying languages, but it is incremental as it applies an existing method to a new language pair.
The paper tackled the problem of distinguishing between Hindi and Magahi, two closely related Indo-Aryan languages, using a rule-based language identifier, achieving an accuracy of approximately 86.34%.
Language identification has become a prerequisite for all kinds of automated text processing systems. In this paper, we present a rule-based language identifier tool for two closely related Indo-Aryan languages: Hindi and Magahi. This system has currently achieved an accuracy of approx 86.34%. We hope to improve this in the future. Automatic identification of languages will be significant in the accuracy of output of Web Crawlers.