CLMay 6, 2019

English-Bhojpuri SMT System: Insights from the Karaka Model

arXiv:1905.02239v11.23 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses machine translation for the low-resource Bhojpuri language, which is incremental as it applies existing SMT methods with a linguistic adaptation.

The paper tackles the problem of machine translation for the English-Bhojpuri language pair by developing a statistical machine translation (SMT) system that incorporates insights from the Karaka model, a linguistic framework, to potentially improve dependency parsing and translation quality, though no concrete numerical results are provided in the abstract.

This thesis has been divided into six chapters namely: Introduction, Karaka Model and it impacts on Dependency Parsing, LT Resources for Bhojpuri, English-Bhojpuri SMT System: Experiment, Evaluation of EB-SMT System, and Conclusion. Chapter one introduces this PhD research by detailing the motivation of the study, the methodology used for the study and the literature review of the existing MT related work in Indian Languages. Chapter two talks of the theoretical background of Karaka and Karaka model. Along with this, it talks about previous related work. It also discusses the impacts of the Karaka model in NLP and dependency parsing. It compares Karaka dependency and Universal Dependency. It also presents a brief idea of the implementation of these models in the SMT system for English-Bhojpuri language pair.

View on arXiv PDF Code

Similar